[jira] [Assigned] (PDFBOX-4130) When W entries not included in CIDFont get width from font by code, Improve display of some PDF files.

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-4130:
---

Assignee: Tilman Hausherr

> When W entries not included in CIDFont get width from font by code, Improve 
> display of some PDF files.
> --
>
> Key: PDFBOX-4130
> URL: https://issues.apache.org/jira/browse/PDFBOX-4130
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.8, 3.0.0 PDFBox
>Reporter: chunlinyao
>Assignee: Tilman Hausherr
>Priority: Major
> Attachments: DateTest.pdf, after.png, before.png, diff.patch
>
>
> Some PDF use CJK font without embedded subset displayed incorrectly, the 
> alphabet become wider.
> This is before the patch.
>   !before.png!
>  
> This is after the patch.
> !after.png!
> The test file [^DateTest.pdf]
> ^This patch only get width from font when these isn't a W entry in CIDFont. 
> If there is an W entry, then any cid not in W entries will return default 
> width.^



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-4109) Static Initialization Deadlock between COSNumber/COSInteger (2)

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-4109.
---
Resolution: Won't Fix

I just thought about this again and I am closing this as "won't fix" because 
the workaround is easy: don't use the test code alone, do something with 
PDDocument first, i.e. create or open a document.

> Static Initialization Deadlock between COSNumber/COSInteger (2)
> ---
>
> Key: PDFBOX-4109
> URL: https://issues.apache.org/jira/browse/PDFBOX-4109
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.5, 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>
> Written by [~jesmith3] in PDFBOX-3698:
> {code:java}
> public class PDFBox3698
> {
> public static void main(String[] args) throws ClassNotFoundException, 
> InterruptedException
> {
> Thread thread = new Thread(new Runnable() {
> @Override
> public void run() {
> try {
> Class.forName(COSNumber.class.getName(), true, 
> COSNumber.class.getClassLoader());
> } catch (ClassNotFoundException ex) {
> //
> }
> }
> });
> thread.start();
> Class.forName(COSInteger.class.getName(), true, 
> COSInteger.class.getClassLoader());
> thread.join();
> }
> }
> {code}
> I was able to reproduce in 2.0.5 with a few executions.
> I downloaded 3.0.0-SNAPSHOT 453 and ran the test against it and I can no 
> longer reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4109) Static Initialization Deadlock between COSNumber/COSInteger (2)

2018-02-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377593#comment-16377593
 ] 

Tilman Hausherr edited comment on PDFBOX-4109 at 2/27/18 6:25 AM:
--

Yes, PDDocument has a static initialization since 2.0.5 that avoids the 
deadlock by initializing one of the classes first. In 3.0 the problem is gone 
because one of the classes is different.

The problem is that your test code isn't realistic, because usually in non test 
code, one would access PDDocument before COSNumber / COSInteger.


was (Author: tilman):
Yes, PDDocument has a static initialization since 2.0.5 that avoids the 
deadlock by initializing one of the classes first.

The problem is that your test code isn't realistic, because usually in non test 
code, one would access PDDocument before COSNumber / COSInteger.

> Static Initialization Deadlock between COSNumber/COSInteger (2)
> ---
>
> Key: PDFBOX-4109
> URL: https://issues.apache.org/jira/browse/PDFBOX-4109
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.5, 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>
> Written by [~jesmith3] in PDFBOX-3698:
> {code:java}
> public class PDFBox3698
> {
> public static void main(String[] args) throws ClassNotFoundException, 
> InterruptedException
> {
> Thread thread = new Thread(new Runnable() {
> @Override
> public void run() {
> try {
> Class.forName(COSNumber.class.getName(), true, 
> COSNumber.class.getClassLoader());
> } catch (ClassNotFoundException ex) {
> //
> }
> }
> });
> thread.start();
> Class.forName(COSInteger.class.getName(), true, 
> COSInteger.class.getClassLoader());
> thread.join();
> }
> }
> {code}
> I was able to reproduce in 2.0.5 with a few executions.
> I downloaded 3.0.0-SNAPSHOT 453 and ran the test against it and I can no 
> longer reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4130) When W entries not included in CIDFont get width from font by code, Improve display of some PDF files.

2018-02-26 Thread chunlinyao (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunlinyao updated PDFBOX-4130:
---
Attachment: diff.patch

> When W entries not included in CIDFont get width from font by code, Improve 
> display of some PDF files.
> --
>
> Key: PDFBOX-4130
> URL: https://issues.apache.org/jira/browse/PDFBOX-4130
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.8, 3.0.0 PDFBox
>Reporter: chunlinyao
>Priority: Major
> Attachments: DateTest.pdf, after.png, before.png, diff.patch
>
>
> Some PDF use CJK font without embedded subset displayed incorrectly, the 
> alphabet become wider.
> This is before the patch.
>   !before.png!
>  
> This is after the patch.
> !after.png!
> The test file [^DateTest.pdf]
> ^This patch only get width from font when these isn't a W entry in CIDFont. 
> If there is an W entry, then any cid not in W entries will return default 
> width.^



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4130) When W entries not included in CIDFont get width from font by code, Improve display of some PDF files.

2018-02-26 Thread chunlinyao (JIRA)
chunlinyao created PDFBOX-4130:
--

 Summary: When W entries not included in CIDFont get width from 
font by code, Improve display of some PDF files.
 Key: PDFBOX-4130
 URL: https://issues.apache.org/jira/browse/PDFBOX-4130
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.8, 3.0.0 PDFBox
Reporter: chunlinyao
 Attachments: DateTest.pdf, after.png, before.png

Some PDF use CJK font without embedded subset displayed incorrectly, the 
alphabet become wider.

This is before the patch.

  !before.png!

 

This is after the patch.

!after.png!

The test file [^DateTest.pdf]

^This patch only get width from font when these isn't a W entry in CIDFont. If 
there is an W entry, then any cid not in W entries will return default width.^



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4109) Static Initialization Deadlock between COSNumber/COSInteger (2)

2018-02-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377593#comment-16377593
 ] 

Tilman Hausherr commented on PDFBOX-4109:
-

Yes, PDDocument has a static initialization since 2.0.5 that avoids the 
deadlock by initializing one of the classes first.

The problem is that your test code isn't realistic, because usually in non test 
code, one would access PDDocument before COSNumber / COSInteger.

> Static Initialization Deadlock between COSNumber/COSInteger (2)
> ---
>
> Key: PDFBOX-4109
> URL: https://issues.apache.org/jira/browse/PDFBOX-4109
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.5, 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>
> Written by [~jesmith3] in PDFBOX-3698:
> {code:java}
> public class PDFBox3698
> {
> public static void main(String[] args) throws ClassNotFoundException, 
> InterruptedException
> {
> Thread thread = new Thread(new Runnable() {
> @Override
> public void run() {
> try {
> Class.forName(COSNumber.class.getName(), true, 
> COSNumber.class.getClassLoader());
> } catch (ClassNotFoundException ex) {
> //
> }
> }
> });
> thread.start();
> Class.forName(COSInteger.class.getName(), true, 
> COSInteger.class.getClassLoader());
> thread.join();
> }
> }
> {code}
> I was able to reproduce in 2.0.5 with a few executions.
> I downloaded 3.0.0-SNAPSHOT 453 and ran the test against it and I can no 
> longer reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-4129) Deleted fonts not detected when checking cache

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-4129.
-
   Resolution: Fixed
Fix Version/s: 3.0.0 PDFBox
   2.0.9

> Deleted fonts not detected when checking cache
> --
>
> Key: PDFBOX-4129
> URL: https://issues.apache.org/jira/browse/PDFBOX-4129
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.9, 3.0.0 PDFBox
>
>
> FileSystemFontProvider doesn't detect when a font in the cache is removed, 
> which results in an error if such a font "matches".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4129) Deleted fonts not detected when checking cache

2018-02-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377584#comment-16377584
 ] 

ASF subversion and git services commented on PDFBOX-4129:
-

Commit 1825416 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1825416 ]

PDFBOX-4129: skip deleted fonts

> Deleted fonts not detected when checking cache
> --
>
> Key: PDFBOX-4129
> URL: https://issues.apache.org/jira/browse/PDFBOX-4129
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
>
> FileSystemFontProvider doesn't detect when a font in the cache is removed, 
> which results in an error if such a font "matches".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4129) Deleted fonts not detected when checking cache

2018-02-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377585#comment-16377585
 ] 

ASF subversion and git services commented on PDFBOX-4129:
-

Commit 1825417 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1825417 ]

PDFBOX-4129: skip deleted fonts

> Deleted fonts not detected when checking cache
> --
>
> Key: PDFBOX-4129
> URL: https://issues.apache.org/jira/browse/PDFBOX-4129
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
>
> FileSystemFontProvider doesn't detect when a font in the cache is removed, 
> which results in an error if such a font "matches".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4129) Deleted fonts not detected when checking cache

2018-02-26 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-4129:
---

 Summary: Deleted fonts not detected when checking cache
 Key: PDFBOX-4129
 URL: https://issues.apache.org/jira/browse/PDFBOX-4129
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.8
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr


FileSystemFontProvider doesn't detect when a font in the cache is removed, 
which results in an error if such a font "matches".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [RESULT][VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.0

2018-02-26 Thread Petr Slabý
Thanks for the great news and great work. I have added it to my dependencies 
and it downloaded fine from the apache maven repository.


I have noticed, that the poms in the PDFBox projects still reference the 
original levigo library. I guess that is the next step?


Best regards,
Petr.

-Původní zpráva- 
From: Andreas Lehmkuehler

Sent: Monday, February 26, 2018 7:09 PM
To: dev@pdfbox.apache.org
Subject: [RESULT][VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.0

On 02/21/2018 10:22 PM, Andreas Lehmkuehler wrote:
Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 
3.0.0.

  +1 Tilman Hausherr
  +1 Jörg Henne
  +1 Maruan Sahyoun
  +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.


Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4109) Static Initialization Deadlock between COSNumber/COSInteger (2)

2018-02-26 Thread Joseph Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377545#comment-16377545
 ] 

Joseph Smith commented on PDFBOX-4109:
--

A bit of both I suppose.  I was reviewing the code that produced the original 
issue which is in a production deployment.  I was exploring the benefit of 
updating versions in order to remove our work around for this issue and clean 
up the code a bit. 

Based on the original developers research the issue came down to this test 
which is in our code base.  I noticed that the issue could not be reproduced 
easily last time because the original test was written in Groovy.  I created a 
Java version which would reproduce the test just to help illustrate and 
reproduce the original issue.

Did something change about the initialization of PDDocument between 1.8.10 and 
2.0.8 that would have impact?

> Static Initialization Deadlock between COSNumber/COSInteger (2)
> ---
>
> Key: PDFBOX-4109
> URL: https://issues.apache.org/jira/browse/PDFBOX-4109
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.5, 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>
> Written by [~jesmith3] in PDFBOX-3698:
> {code:java}
> public class PDFBox3698
> {
> public static void main(String[] args) throws ClassNotFoundException, 
> InterruptedException
> {
> Thread thread = new Thread(new Runnable() {
> @Override
> public void run() {
> try {
> Class.forName(COSNumber.class.getName(), true, 
> COSNumber.class.getClassLoader());
> } catch (ClassNotFoundException ex) {
> //
> }
> }
> });
> thread.start();
> Class.forName(COSInteger.class.getName(), true, 
> COSInteger.class.getClassLoader());
> thread.join();
> }
> }
> {code}
> I was able to reproduce in 2.0.5 with a few executions.
> I downloaded 3.0.0-SNAPSHOT 453 and ran the test against it and I can no 
> longer reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-4128) Glyphes missing in text extraction

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-4128.
---
Resolution: Not A Bug

Closing as this isn't a bug. Maybe this is some obfuscation by the creator of 
the file. Send him this issue.

> Glyphes missing in text extraction
> --
>
> Key: PDFBOX-4128
> URL: https://issues.apache.org/jira/browse/PDFBOX-4128
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.8
>Reporter: Alexandre
>Priority: Minor
> Attachments: PDFDebugger-Screenshot.png, glyph-miss-1.png, 
> glyph-miss-2.png, glyph-missing.pdf
>
>
> Dear Apache contributors,
> I found some documents where glyphes are missing while extracting the text. 
> Could you confirm ?
> Hand, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4128) Glyphes missing in text extraction

2018-02-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377529#comment-16377529
 ] 

Tilman Hausherr commented on PDFBOX-4128:
-

Yes... the unicode mappings are missing, see the log output. And if you look at 
PDFDebugger you'll also see missing unicode.
 !PDFDebugger-Screenshot.png! 

> Glyphes missing in text extraction
> --
>
> Key: PDFBOX-4128
> URL: https://issues.apache.org/jira/browse/PDFBOX-4128
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.8
>Reporter: Alexandre
>Priority: Minor
> Attachments: PDFDebugger-Screenshot.png, glyph-miss-1.png, 
> glyph-miss-2.png, glyph-missing.pdf
>
>
> Dear Apache contributors,
> I found some documents where glyphes are missing while extracting the text. 
> Could you confirm ?
> Hand, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4128) Glyphes missing in text extraction

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4128:

Attachment: PDFDebugger-Screenshot.png

> Glyphes missing in text extraction
> --
>
> Key: PDFBOX-4128
> URL: https://issues.apache.org/jira/browse/PDFBOX-4128
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.8
>Reporter: Alexandre
>Priority: Minor
> Attachments: PDFDebugger-Screenshot.png, glyph-miss-1.png, 
> glyph-miss-2.png, glyph-missing.pdf
>
>
> Dear Apache contributors,
> I found some documents where glyphes are missing while extracting the text. 
> Could you confirm ?
> Hand, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4128) Glyphes missing in text extraction

2018-02-26 Thread Alexandre (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377504#comment-16377504
 ] 

Alexandre commented on PDFBOX-4128:
---

I don't know what's going on, but it extracts few characters only.

Acrobat extracts with errors too. The result is like this: 

 
{code:java}
21 octobre
Show-case de Siau -
C􀁋􀁄􀁑􀁖􀁒􀁑 􀂫􀁏􀁈􀁆􀁗􀁕􀁒-􀁓􀁒􀁓􀀑 S􀁌􀁄􀁘
􀁌􀁐􀁄􀁊􀁌􀁑􀁈 􀁘􀁑􀁈 􀁓􀁒􀁓 􀂢 􀁏􀁄 􀁅􀁈􀁄􀁘􀁗
{code}
 

> Glyphes missing in text extraction
> --
>
> Key: PDFBOX-4128
> URL: https://issues.apache.org/jira/browse/PDFBOX-4128
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.8
>Reporter: Alexandre
>Priority: Minor
> Attachments: glyph-miss-1.png, glyph-miss-2.png, glyph-missing.pdf
>
>
> Dear Apache contributors,
> I found some documents where glyphes are missing while extracting the text. 
> Could you confirm ?
> Hand, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4128) Glyphes missing in text extraction

2018-02-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377492#comment-16377492
 ] 

Tilman Hausherr commented on PDFBOX-4128:
-

What is missing, and have you read
https://pdfbox.apache.org/2.0/faq.html#notext ?

> Glyphes missing in text extraction
> --
>
> Key: PDFBOX-4128
> URL: https://issues.apache.org/jira/browse/PDFBOX-4128
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.8
>Reporter: Alexandre
>Priority: Minor
> Attachments: glyph-miss-1.png, glyph-miss-2.png, glyph-missing.pdf
>
>
> Dear Apache contributors,
> I found some documents where glyphes are missing while extracting the text. 
> Could you confirm ?
> Hand, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4128) Glyphes missing in text extraction

2018-02-26 Thread Alexandre (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre updated PDFBOX-4128:
--
Priority: Minor  (was: Major)

> Glyphes missing in text extraction
> --
>
> Key: PDFBOX-4128
> URL: https://issues.apache.org/jira/browse/PDFBOX-4128
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.8
>Reporter: Alexandre
>Priority: Minor
> Attachments: glyph-miss-1.png, glyph-miss-2.png, glyph-missing.pdf
>
>
> Dear Apache contributors,
> I found some documents where glyphes are missing while extracting the text. 
> Could you confirm ?
> Hand, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4128) Glyphes missing in text extraction

2018-02-26 Thread Alexandre (JIRA)
Alexandre created PDFBOX-4128:
-

 Summary: Glyphes missing in text extraction
 Key: PDFBOX-4128
 URL: https://issues.apache.org/jira/browse/PDFBOX-4128
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 2.0.8
Reporter: Alexandre
 Attachments: glyph-miss-1.png, glyph-miss-2.png, glyph-missing.pdf

Dear Apache contributors,

I found some documents where glyphes are missing while extracting the text. 
Could you confirm ?

Hand, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-4098) Prepare the JBIG2 repository for the first release

2018-02-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-4098.
--
Resolution: Fixed

I adjusted the build process itself and did the first release.

Set closed

> Prepare the JBIG2 repository for the first release
> --
>
> Key: PDFBOX-4098
> URL: https://issues.apache.org/jira/browse/PDFBOX-4098
> Project: PDFBox
>  Issue Type: Task
>  Components: JBIG2
>Affects Versions: 3.0.0 JBIG2
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 JBIG2
>
>
> There are some minor adjustments to do before we can release the plugin
>  * javadoc
>  * formatting
>  * build/release artifacts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.0

2018-02-26 Thread Andreas Lehmkuehler

On 02/21/2018 10:22 PM, Andreas Lehmkuehler wrote:

Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.0.

  +1 Tilman Hausherr
  +1 Jörg Henne
  +1 Maruan Sahyoun
  +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.


Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-4127) JPEG reading fails

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-4127.
---
Resolution: Duplicate

Closing as duplicate of PDFBOX-2128 (and others). The bug is in the ImageIO 
library, not in PDFBox.

> JPEG reading fails
> --
>
> Key: PDFBOX-4127
> URL: https://issues.apache.org/jira/browse/PDFBOX-4127
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.8
>Reporter: savan patel
>Priority: Major
>  Labels: DCTFilter
> Attachments: 5a9417fbcf00c.pdf
>
>
> on page 4 there is JPEG file reading gives error "Numbers of source Raster 
> bands and source color space components do not match"...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4127) JPEG reading fails

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4127:

Component/s: (was: Parsing)
 Rendering

> JPEG reading fails
> --
>
> Key: PDFBOX-4127
> URL: https://issues.apache.org/jira/browse/PDFBOX-4127
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.8
>Reporter: savan patel
>Priority: Major
>  Labels: DCTFilter
> Attachments: 5a9417fbcf00c.pdf
>
>
> on page 4 there is JPEG file reading gives error "Numbers of source Raster 
> bands and source color space components do not match"...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4127) JPEG reading fails

2018-02-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4127:

Affects Version/s: 2.0.8

> JPEG reading fails
> --
>
> Key: PDFBOX-4127
> URL: https://issues.apache.org/jira/browse/PDFBOX-4127
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.8
>Reporter: savan patel
>Priority: Major
>  Labels: DCTFilter
> Attachments: 5a9417fbcf00c.pdf
>
>
> on page 4 there is JPEG file reading gives error "Numbers of source Raster 
> bands and source color space components do not match"...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4127) JPEG reading fails

2018-02-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377207#comment-16377207
 ] 

Tilman Hausherr commented on PDFBOX-4127:
-

Use the twelvemonkeys plugin.
{code:xml}

com.twelvemonkeys.imageio
imageio-jpeg
3.3.2

{code}


> JPEG reading fails
> --
>
> Key: PDFBOX-4127
> URL: https://issues.apache.org/jira/browse/PDFBOX-4127
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Reporter: savan patel
>Priority: Major
>  Labels: DCTFilter
> Attachments: 5a9417fbcf00c.pdf
>
>
> on page 4 there is JPEG file reading gives error "Numbers of source Raster 
> bands and source color space components do not match"...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2018-02-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377201#comment-16377201
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1825386 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1825386 ]

PDFBOX-4071: set current version of owasp plugin

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2018-02-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377203#comment-16377203
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1825388 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1825388 ]

PDFBOX-4071: set current version of owasp plugin

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2018-02-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377202#comment-16377202
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1825387 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1825387 ]

PDFBOX-4071: set current version of owasp plugin

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4127) JPEG reading fails

2018-02-26 Thread savan patel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

savan patel updated PDFBOX-4127:

Issue Type: Bug  (was: Improvement)

> JPEG reading fails
> --
>
> Key: PDFBOX-4127
> URL: https://issues.apache.org/jira/browse/PDFBOX-4127
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Reporter: savan patel
>Priority: Major
>  Labels: DCTFilter
> Attachments: 5a9417fbcf00c.pdf
>
>
> on page 4 there is JPEG file reading gives error "Numbers of source Raster 
> bands and source color space components do not match"...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4127) JPEG reading fails

2018-02-26 Thread savan patel (JIRA)
savan patel created PDFBOX-4127:
---

 Summary: JPEG reading fails
 Key: PDFBOX-4127
 URL: https://issues.apache.org/jira/browse/PDFBOX-4127
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Reporter: savan patel
 Attachments: 5a9417fbcf00c.pdf

on page 4 there is JPEG file reading gives error "Numbers of source Raster 
bands and source color space components do not match"...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[GitHub] pdfbox pull request #44: When widths not contains the cid, get width from fo...

2018-02-26 Thread chunlinyao
GitHub user chunlinyao opened a pull request:

https://github.com/apache/pdfbox/pull/44

When widths not contains the cid, get width from font by code

Some PDF use CJK font  without embedded subset displayed incorrectly, the 
alphabet become wider.

This is result before the patch.

![2018-02-26_332x112](https://user-images.githubusercontent.com/148663/36660884-a2b6f984-1b14-11e8-9ab7-62138e3b8452.png)

This is after patch.

![2018-02-26_378x96](https://user-images.githubusercontent.com/148663/36660944-c8c747e6-1b14-11e8-9c56-e923fcb86f01.png)

The origin pdf.
[dd.pdf](https://github.com/apache/pdfbox/files/1757749/dd.pdf)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunlinyao/pdfbox trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/pdfbox/pull/44.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #44


commit 8a754cad5ee6b732d86180dff60028a9304a015b
Author: chunlinyao 
Date:   2018-02-26T08:40:42Z

When widths not contains the cid, get width from font by code

Some PDF use CJK font  without embedded subset displayed incorrectly, the 
alphabet become wider.




---

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org