Re: Remove AWT Fonts

2014-03-04 Thread Maruan Sahyoun
Hi John,

what about just using the platform fonts? If not then Latex uses the URW++ 
fonts which were made available under the http://www.latex-project.org/lppl 
license. (same fonts are used by Ghostscript). Could check if the license is 
fine with ours.

BR
Maruan Sahyoun

Am 03.03.2014 um 21:20 schrieb John Hewson :

> Hi All
> 
> I wanted to bring PDFBOX-1959 to the attention of the mailing list. PDFBox is 
> ready to leave AWT font rendering behind as the JDKs rendering has proven to 
> be buggy and we now have our own renderers for all font types in 2.0.0.
> 
> Before we can do this we need to ship a set of standard 14 fonts with PDFBox 
> as currently the system fonts are being used via AWT. We also need to provide 
> a mechanism for the user to supply their own external fonts for cases where 
> embedded fonts are missing. 
> 
> The main question is, what fonts should we ship? Some of the "free" fonts 
> I've seen render very poorly, any suggestions? Furthermore, are there fonts 
> under more restrictive licenses which we could ship? Apache does allow for 
> such files to be part of a project under certain conditions.
> 
> Also: Adobe has some font packs, e.g. Japanese, which we could point users 
> towards.
> 
> Cheers
> 
> -- John



Re: [GSoC 2014]Implement shading with Coons and tensor-product patch meshes

2014-03-04 Thread Thimal Kempitiya
Hi,
I checked the code related to the shading and studied the pdf spec related
to the type 6. As I see it is going same as the type 4
>From what I feel this is need to be done correct me if I'm wrong
first need to get the 12 control points and colors related to each unit
from stream
create the 4 cubic Bézier curves which are boundaries of each patch ( to
find a Bézier curve it need 4 control points, two points are part of the
curve )
given point need to find the point which patch (I think this can be
done[1]but need
research on that)
find the color of the point using bi linear interpolation

This has the same structure as the other shading types but need to
structure to keep Bézier curves and patches
and patches are connected as in the type4

I have to study the type 7 but I think its similar to this

please give feedback on my approach

[1]http://en.wikipedia.org/wiki/Plane_%28geometry%29



On Sun, Mar 2, 2014 at 11:18 AM, Thimal Kempitiya wrote:

> yeah I'm using  trunk code(2.0) and I wanted to render the image got it,
> thanks. I'm currently studying the 1 to 5 shading implementations and the
> pdf spec related to 6 and 7 type shading and i'll buzz you if i got issue.
> Once again thanks for quick reply
>
>
> On Sat, Mar 1, 2014 at 2:38 AM, John Hewson  wrote:
>
>> You'll need to use the latest 2.0.0 snapshot jar, which is the unstable
>> version from trunk
>> and the place where new development occurs.
>>
>> -- John
>>
>> On 28 Feb 2014, at 05:04, Thimal Kempitiya  wrote:
>>
>> > Hi,
>> > I'm Thimal Kempitiya, third year computer science and engineering
>> > undergraduate at university of moratuwa. I'm interested in the project
>> idea
>> > "implement shading with Coons and tensor-product patch meshes". I have
>> the
>> > basic knowledge about the cubic Bézier curves", , "bilinear
>> interpolation",
>> > " and "Bernstein polynomials and i think can manage rest of the
>> mathematics
>> > needed for this project. Also I have the java knowledge to do the
>> > implementation part.
>> >
>> > I clone the PDFBOX repository and checked the code regarding to the
>> shading
>> > and I tried some examples in pdfbox cook book.
>> >
>> > Also I tried code to work with the type 1 shading type.
>> >
>> >File f=new File("C:\\asy-latticeshading.pdf");
>> >try {
>> >PDDocument doc=PDDocument.load(f);
>> >   PDPage p;
>> >  p=(PDPage)doc.getDocumentCatalog().getAllPages().get(0);
>> >  PDShadingType1 pdst1=new PDShadingType1(p.getCOSDictionary());
>> >  PDRectangle pdr=p.findCropBox();
>> >  PDGraphicsState pdg=new PDGraphicsState(pdr);
>> >  Matrix m=pdg.getCurrentTransformationMatrix();
>> >  Type1ShadingPaint t1sp=new Type1ShadingPaint(pdst1, m,
>> > (int)p.findCropBox().getHeight());
>> >
>> > But this give me error saying unknown shading type 0
>> > java.io.IOException: Error: Unknown shading type 0
>> >
>> > can you please tell me what I'm doing wrong here and how can i solve
>> this.
>> > --
>> >
>> >
>> >
>> >
>> > *Thimal Kempitiya 
>> UndergraduateDepartment
>> > of Computer Science and Engineering University of Moratuwa.*
>>
>>
>
>
> --
>
>
>
>
> *Thimal Kempitiya 
> UndergraduateDepartment of Computer Science and Engineering University of
> Moratuwa.*
>



-- 




*Thimal Kempitiya  UndergraduateDepartment
of Computer Science and Engineering University of Moratuwa.*


[jira] [Reopened] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread Vicente (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vicente reopened PDFBOX-1956:
-


Jonh Hewson,

You are rigth in your comment about 􀀳􀀧􀀩 􀀲􀁅􀁍􀁈􀁆􀁗􀁖 when I have done Copy/Paste 
from PDF to another editor. but we can read the PDF (visual). Do you know 
how I can check problem in PDF (like this) ? Working with PDFBOX is possible to 
check it ?

> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Task
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1709) processEncodedText gives x-coord short by width of previous text, for next text at same y-coord.

2014-03-04 Thread Robert Simms (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919494#comment-13919494
 ] 

Robert Simms commented on PDFBOX-1709:
--

True, the individual characters are at the correct positions in all three cases.

My point is that the function/method processEncodedText()  which is 
supposed to return strings of characters, is not handling horizontal gaps 
between characters consistently.

> processEncodedText gives x-coord short by width of previous text, for next 
> text at same y-coord.
> 
>
> Key: PDFBOX-1709
> URL: https://issues.apache.org/jira/browse/PDFBOX-1709
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.2
> Environment: Windows 7 sp1, Javac 1.6.0_30, Java 1.7.0_17
>Reporter: Robert Simms
>  Labels: test
> Attachments: PDFBOX1709-0.pdf, PDFBOX1709-1.pdf, PDFBOX1709-2.pdf
>
>
> Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
> processEncodedText().
> %!
> /Helvetica findfont 20 scalefont setfont
> 100 72 moveto
> (Hello) show
> % CASES
> %Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
> or acrobat distiller),
> %then process the PDF with java implementation of PDFBox PDFTextStripper.
> %listing text and x,y positions obtained by overriding the 
> processEncodedText() method.
> %For example, the x-coord. of a text item may be printed in that method 
> with
> %   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());
> % % 0. Works to convince processEncodedText that string 'Hello world.' was at 
> 100,72.  This is good.
> %
> % ( world.) show
> % % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' 
> ' + 'world.'
> % %Instead,
> % %x-coord. of 'world.' reported as being actual position minus width of 
> 'Hello', plus width of ' '
> % %which is x=105.56 in this case.
> % 
> %( ) stringwidth pop 0 rmoveto
> %(world.) show
> % % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
> vertical position causes
> % %processEncodedText to give
> % %x-coord. of 'world.' as actual position minus width of 'Hello'
> % %which is x=200 in this case.
> %
> %100 0 rmoveto
> %(world.) show
> showpage



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919514#comment-13919514
 ] 

Tilman Hausherr commented on PDFBOX-1956:
-

The visual is just a bunch of glyph *images* that happen to make sense to you 
because you can read. 

To make a check if the pdf is searchable one solution could be to test whether 
the extracted words can be found in a dictionary of common words.



> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Task
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [GSoC 2014]Implement shading with Coons and tensor-product patch meshes

2014-03-04 Thread Tilman Hausherr

Am 04.03.2014 15:19, schrieb Thimal Kempitiya:

Hi,
I checked the code related to the shading and studied the pdf spec related
to the type 6. As I see it is going same as the type 4
 From what I feel this is need to be done correct me if I'm wrong
first need to get the 12 control points and colors related to each unit
from stream


Yes


create the 4 cubic Bézier curves which are boundaries of each patch ( to
find a Bézier curve it need 4 control points, two points are part of the
curve )


Yes, although they are not painted as a curve, the curve is part of a 
formula to find out whether a point is inside or outside the patch.



given point need to find the point which patch (I think this can be
done[1]but need
research on that)
find the color of the point using bi linear interpolation


Probably. Although I'm not sure if the bilinear interpolation is the 
same as used for a rectangle.

http://www.particleincell.com/blog/2012/quad-interpolation/
I also don't know if the curves needs to be taken into account.


This has the same structure as the other shading types but need to
structure to keep Bézier curves and patches
and patches are connected as in the type4


Yes, the data structure is similar. You can use existing code and do lot 
of copy & paste there, although there is some rearrangement needed as 
there are more points.




I have to study the type 7 but I think its similar to this


According to the spec, type 6 is a special case of type 7. I cannot tell 
whether it is enough to implement type 7 only and derive type 6 from it, 
i.e. I don't know if the performance would be worse.


Tilman



please give feedback on my approach

[1]http://en.wikipedia.org/wiki/Plane_%28geometry%29



On Sun, Mar 2, 2014 at 11:18 AM, Thimal Kempitiya wrote:


yeah I'm using  trunk code(2.0) and I wanted to render the image got it,
thanks. I'm currently studying the 1 to 5 shading implementations and the
pdf spec related to 6 and 7 type shading and i'll buzz you if i got issue.
Once again thanks for quick reply


On Sat, Mar 1, 2014 at 2:38 AM, John Hewson  wrote:


You'll need to use the latest 2.0.0 snapshot jar, which is the unstable
version from trunk
and the place where new development occurs.

-- John

On 28 Feb 2014, at 05:04, Thimal Kempitiya  wrote:


Hi,
I'm Thimal Kempitiya, third year computer science and engineering
undergraduate at university of moratuwa. I'm interested in the project

idea

"implement shading with Coons and tensor-product patch meshes". I have

the

basic knowledge about the cubic Bézier curves", , "bilinear

interpolation",

" and "Bernstein polynomials and i think can manage rest of the

mathematics

needed for this project. Also I have the java knowledge to do the
implementation part.

I clone the PDFBOX repository and checked the code regarding to the

shading

and I tried some examples in pdfbox cook book.

Also I tried code to work with the type 1 shading type.

File f=new File("C:\\asy-latticeshading.pdf");
try {
PDDocument doc=PDDocument.load(f);
   PDPage p;
  p=(PDPage)doc.getDocumentCatalog().getAllPages().get(0);
  PDShadingType1 pdst1=new PDShadingType1(p.getCOSDictionary());
  PDRectangle pdr=p.findCropBox();
  PDGraphicsState pdg=new PDGraphicsState(pdr);
  Matrix m=pdg.getCurrentTransformationMatrix();
  Type1ShadingPaint t1sp=new Type1ShadingPaint(pdst1, m,
(int)p.findCropBox().getHeight());

But this give me error saying unknown shading type 0
java.io.IOException: Error: Unknown shading type 0

can you please tell me what I'm doing wrong here and how can i solve

this.

--




*Thimal Kempitiya 

UndergraduateDepartment

of Computer Science and Engineering University of Moratuwa.*




--




*Thimal Kempitiya 
UndergraduateDepartment of Computer Science and Engineering University of
Moratuwa.*








[jira] [Commented] (PDFBOX-870) PDF-To-IMAGE output is not anti-aliased

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919742#comment-13919742
 ] 

John Hewson commented on PDFBOX-870:


{quote}
The current plan
http://de.redtransporte.com/valencia/metro-valencia/plan.pdf
has no serifs.
{quote}

But the file in question is not the current plan, it's the old one with a 
missing embedded font.

> PDF-To-IMAGE output is not anti-aliased
> ---
>
> Key: PDFBOX-870
> URL: https://issues.apache.org/jira/browse/PDFBOX-870
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 1.3.1
>Reporter: Nicolas Hoibian
> Attachments: a_metro-vlc.pdf, 
> a_metro-vlc.pdf-1-antialiasing-fontmods.png, 
> a_metro-vlc.pdf-1-antialioasing.png, a_metro-vlc.pdf-1.png, 
> pdf-renderer_vlc.png, pdfbox_vlc.png
>
>
> Hi
> I am a user of pdf-renderer from java.net, and I am looking into the pdf to 
> image part of pdfbox.
> So far it seems that pdfbox can render more of the pdf that are problematic 
> with pdf-renderer, but for those that work in both, the pdf-to-image output 
> is prettier for pdf-renderer, as both the font and shapes/path/lines are 
> anti-aliased.
> With PDFbox, the text is sometimes antialiased, but never the shapes/drawings.
> Is there a way to have shapes antialiased ?
> I am using the latest version from svn.
> Here are some examples in the difference of rendering



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Remove AWT Fonts

2014-03-04 Thread John Hewson
Hi Maruan

Java provides access to platform fonts via AWT and does not reveal the paths to 
the fonts
which it finds, so it is not practical to use platform fonts without using AWT. 
There have also
been a number of problems with some unix platforms which lack some of the 
standard 14
fonts or which ship with poor quality substitutes. Ideally, PDFBox should 
produce the same
result irrespective of which platform it is running on, much like Adobe Reader 
(excluding any
missing embedded fonts, of course).

I’ve had poor experiences in the past with the Nimbus family of fonts from 
URW++ but there
are numerous factors (kerning, hinting, metrics, TTF vs Type 1) which may have 
changed since
then. We should check out how well these fonts compare with the standard 14 
used by Adobe,
in particular whether or not the metrics actually match (I know that it is 
claimed that they do).

-- John

On 4 Mar 2014, at 05:48, Maruan Sahyoun  wrote:

> Hi John,
> 
> what about just using the platform fonts? If not then Latex uses the URW++ 
> fonts which were made available under the http://www.latex-project.org/lppl 
> license. (same fonts are used by Ghostscript). Could check if the license is 
> fine with ours.
> 
> BR
> Maruan Sahyoun
> 
> Am 03.03.2014 um 21:20 schrieb John Hewson :
> 
>> Hi All
>> 
>> I wanted to bring PDFBOX-1959 to the attention of the mailing list. PDFBox 
>> is ready to leave AWT font rendering behind as the JDKs rendering has proven 
>> to be buggy and we now have our own renderers for all font types in 2.0.0.
>> 
>> Before we can do this we need to ship a set of standard 14 fonts with PDFBox 
>> as currently the system fonts are being used via AWT. We also need to 
>> provide a mechanism for the user to supply their own external fonts for 
>> cases where embedded fonts are missing. 
>> 
>> The main question is, what fonts should we ship? Some of the "free" fonts 
>> I've seen render very poorly, any suggestions? Furthermore, are there fonts 
>> under more restrictive licenses which we could ship? Apache does allow for 
>> such files to be part of a project under certain conditions.
>> 
>> Also: Adobe has some font packs, e.g. Japanese, which we could point users 
>> towards.
>> 
>> Cheers
>> 
>> -- John
> 



[jira] [Resolved] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1956.
-

Resolution: Invalid

> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Task
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919787#comment-13919787
 ] 

John Hewson commented on PDFBOX-1956:
-

{quote}
Do you know how I can check problem in PDF (like this) ? Working with PDFBOX is 
possible to check it ?
{quote}

We see PDFs like this fairly often, the problem is that the text embedded in 
PDF is perfectly valid, it's just that the font's encoding is meaningless to a 
human. The embedded font maps the character 􀀳 to a glyph which is obviously the 
letter "P" but we have no way to know this, from our point of view the glyph 
claims to be 􀀳.

To detect PDFs with this problem, you could try 
https://code.google.com/p/language-detection/ and see if the language 
identified is what you were expecting. Let me know if you try this and it works.

> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Task
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919787#comment-13919787
 ] 

John Hewson edited comment on PDFBOX-1956 at 3/4/14 7:15 PM:
-

{quote}
Do you know how I can check problem in PDF (like this) ? Working with PDFBOX is 
possible to check it ?
{quote}

We see PDFs like this fairly often, the problem is that the text embedded in 
the PDF is perfectly valid, it's just that the font's encoding is meaningless 
to a human. The embedded font maps the character 􀀳 to a glyph which is 
obviously the letter "P" but we have no way to know this, as the glyph claims 
to be 􀀳.

To detect PDFs with this problem, you could try 
https://code.google.com/p/language-detection/ and see if the language 
identified is what you were expecting. Let me know if you try this and it works.


was (Author: jahewson):
{quote}
Do you know how I can check problem in PDF (like this) ? Working with PDFBOX is 
possible to check it ?
{quote}

We see PDFs like this fairly often, the problem is that the text embedded in 
PDF is perfectly valid, it's just that the font's encoding is meaningless to a 
human. The embedded font maps the character 􀀳 to a glyph which is obviously the 
letter "P" but we have no way to know this, from our point of view the glyph 
claims to be 􀀳.

To detect PDFs with this problem, you could try 
https://code.google.com/p/language-detection/ and see if the language 
identified is what you were expecting. Let me know if you try this and it works.

> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>Priority: Minor
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1956:


Issue Type: Bug  (was: Task)

> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1956:


Priority: Minor  (was: Major)

> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>Priority: Minor
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919787#comment-13919787
 ] 

John Hewson edited comment on PDFBOX-1956 at 3/4/14 7:15 PM:
-

{quote}
Do you know how I can check problem in PDF (like this) ? Working with PDFBOX is 
possible to check it ?
{quote}

We see PDFs like this fairly often, the problem is that the text embedded in 
the PDF is perfectly valid, it's just that the font's encoding is meaningless 
to a human. The embedded font maps the character 􀀳 to a glyph which is 
obviously the letter "P" but we have no way to know this, as the glyph claims 
to be 􀀳 and PDFBox cannot read the glyph as a human could, at least not without 
OCR.

To detect PDFs with this problem, you could try 
https://code.google.com/p/language-detection/ and see if the language 
identified is what you were expecting. Let me know if you try this and it works.


was (Author: jahewson):
{quote}
Do you know how I can check problem in PDF (like this) ? Working with PDFBOX is 
possible to check it ?
{quote}

We see PDFs like this fairly often, the problem is that the text embedded in 
the PDF is perfectly valid, it's just that the font's encoding is meaningless 
to a human. The embedded font maps the character 􀀳 to a glyph which is 
obviously the letter "P" but we have no way to know this, as the glyph claims 
to be 􀀳.

To detect PDFs with this problem, you could try 
https://code.google.com/p/language-detection/ and see if the language 
identified is what you were expecting. Let me know if you try this and it works.

> Wrong character on conversion PDF to TXT
> 
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.4
> Environment: Windows
>Reporter: Vicente
>Priority: Minor
>  Labels: parser
> Attachments: example b.pdf, itext_pdfabc-sample.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String 
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in 
> Text. The character are wrong. Any suggesting ?
> the code 
> public class PDFTextParser {
> 
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
> 
> // PDFTextParser Constructor 
> public PDFTextParser() {
> }
> 
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
> 
> System.out.println("Parsing text from PDF file " + fileName + "");
> File f = new File(fileName);
> 
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
> 
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
> 
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc); 
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF 
> Document.");
> e.printStackTrace();
> try {
>if (cosDoc != null) cosDoc.close();
>if (pdDoc != null) pdDoc.close();
>} catch (Exception e1) {
>e.printStackTrace();
> }
> return null;
> }  
> System.out.println("Done.");
> return parsedText;
> }
> 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened PDFBOX-52:
---

  Assignee: Tilman Hausherr  (was: John Hewson)

Reopen to include my change in the 1.8 branch.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-52.
---

   Resolution: Fixed
Fix Version/s: 1.8.5

Committed in rev. 1574180. I didn't delete DCTFilter in case John wants to make 
a different solution. I also didn't include the patch of Simon because I don't 
have a test file.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [GSoC 2014]Optical Character Recognition project - Introduction

2014-03-04 Thread John Hewson
Hi Dimuthu,

1,2,3:

Feel free to write your own Tesseract binding or port the existing code as you 
see fit.
The JNI binding should be minimal, only the methods you require need to be 
wrapped.
Also, don’t forget that some of the interop can be done in Java, for example if 
it is easier
to convert a BufferedImage to a byte array in Java then do it there and pass 
the result
to JNI rather than writing lots of JNI C++ to achieve the same result.

Your GitHub repo looks like a good start, I can make comments there as things 
progress.

Is it possible to build Tesseract without leptonica? I was under the impression 
that it was
used for image i/o only, but I may be misinformed.

4:  The native platform library should be built as part of the Maven build for 
the Tesseract
wrapper which can be a separate project. The output can be a jar file which 
contains the
native binaries. It should be possible for the jar to contain prebuilt binaries 
for all platforms
but this is something we can worry about later. Right now the goal should be to 
build a jar
containing just the current platform’s native binary and any Java wrapper code.

-- John

On 3 Mar 2014, at 16:41, DImuthu Upeksha  wrote:

> Hi John,
> 
> I tried to reuse that android jni wrapper for tesseract. Here is my
> observation
> 
> 1. This wrapper heavily depends on android image libraries.
> (android/bitmap.h). Most of the wrapper methods [1] use this library.
> 
> 2. But I can understand underlying logic in each function. Basically what
> it does is mapping between tesseract api functions [2] with java methods.
> In between it does to some image <=> byte array like conversions by using
> that bitmap libraries in Android
> 
> 3. There are two ways. 1: We can port it's code to make compatible with our
> environments(linux,windows and mac) which is really painful. Also it will
> cause memory leaks. 2: We can use only it's function signatures and
> implement using our codes
> 
> I think 2nd solution is better because we need only few operations to be
> done using tesseract library. I have created a github repo [3] for this.
> It's still not finished. I need to add some make files and build files to
> make it run properly. And also I need to implement those wrapper functions
> [3]. This may take some time.
> 
> 4. Because we are calling native libraries we need different builds of
> tesseract and leptonica libraries for each platform (dll for windows, so
> for linux, dylib for mac). So we may need to build those libraries at the
> time we build pdfbox project. Or we can pre build those libraries and add
> them to the project as .dll, .so or .dylib format. What is the preferred
> way?
> 
> [1]
> https://code.google.com/p/tesseract-android-tools/source/browse/tesseract-android-tools/jni/com_googlecode_tesseract_android/tessbaseapi.cpp
> [2] https://code.google.com/p/tesseract-ocr/wiki/APIExample
> [3] https://github.com/DImuthuUpe/Tesseract-API
> [4]
> https://github.com/DImuthuUpe/Tesseract-API/blob/master/jni/tesseract/tessbaseapi.cpp
> 
> Thanks
> Dimuthu
> 
> 
> On Sat, Mar 1, 2014 at 11:39 PM, DImuthu Upeksha > wrote:
> 
>> I updated necessary changes to the document [1]
>> 
>> For last two days I had a deep look at this [2] jni wrapper for tessaract
>> api.
>> Unfortunately this has been designed for Android environment so I think we
>> need to write our own make files to build this in to a dll(windows) or
>> dylib(in mac). Currently it has Android.mk files [3]. I'm searching for a
>> way to convert it to a make file that we can run on console. Please suggest
>> if you have a better approach
>> 
>> [1]
>> https://www.dropbox.com/s/9qclvq26divwr2q/Optical%20Character%20Recognition%20for%20PDFBox%20-%20updated.pdf
>> [2]
>> https://code.google.com/p/tesseract-android-tools/source/browse/tesseract-android-tools/jni/com_googlecode_tesseract_android/
>> [3]
>> https://code.google.com/p/tesseract-android-tools/source/browse/tesseract-android-tools/jni/com_googlecode_tesseract_android/Android.mk
>> 
>> 
>> On Sat, Mar 1, 2014 at 12:27 AM, John Hewson  wrote:
>> 
>>> This is a good start. However, there is no need for the Adder component,
>>> "Extracted Text (OCR) can just feed back into the PDFBox "Text Extractor".
>>> 
>>> Maybe show a "PDF" file feeding in to "Text Extractor, to make it clear
>>> where the process starts.
>>> 
>>> -- John
>>> 
>>> On 26 Feb 2014, at 16:53, DImuthu Upeksha 
>>> wrote:
>>> 
 Sorry for the mistake. I added it to my Dropbox [1].
 
 [1]
 
>>> https://www.dropbox.com/s/y3m15rfjmw4eqij/Optical%20Character%20Recognition%20for%20PDFBox.pdf
 
 Thanks
 Dimuthu
 
 
 On Thu, Feb 27, 2014 at 4:44 AM, John Hewson  wrote:
 
> I should add that the OCR engine should be pluggable so PDFToText might
> use an interface, e.g. OCREngine and there will be a TesseractOCREngine
> class somewhere which provides the required functionality and lives in
>>> a
> separate jar file.
> 
> --

Re: Remove AWT Fonts

2014-03-04 Thread Maruan Sahyoun
Hi John,

what I was having in mind is something similar to Apache FOP’s auto detect 
feature for fonts.

doc: https://xmlgraphics.apache.org/fop/1.1/fonts.html
code: 
http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/fonts/autodetect/

Fo inclusion these are some additional candidates

https://fedorahosted.org/liberation-fonts/ (SIL licensed 
http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
Croscore fonts https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts


I’d think if we can avoid bundling a set of fonts but use OS fonts and/or allow 
people to use their own will help us in the long run as if the quality is not 
inline with the ones used by Adobe Reader there will be additional 
questions/issues/bug reports we are not able to resolve.

BR

Maruan Sahyoun

Am 04.03.2014 um 19:34 schrieb John Hewson :

> Hi Maruan
> 
> Java provides access to platform fonts via AWT and does not reveal the paths 
> to the fonts
> which it finds, so it is not practical to use platform fonts without using 
> AWT. There have also
> been a number of problems with some unix platforms which lack some of the 
> standard 14
> fonts or which ship with poor quality substitutes. Ideally, PDFBox should 
> produce the same
> result irrespective of which platform it is running on, much like Adobe 
> Reader (excluding any
> missing embedded fonts, of course).
> 
> I’ve had poor experiences in the past with the Nimbus family of fonts from 
> URW++ but there
> are numerous factors (kerning, hinting, metrics, TTF vs Type 1) which may 
> have changed since
> then. We should check out how well these fonts compare with the standard 14 
> used by Adobe,
> in particular whether or not the metrics actually match (I know that it is 
> claimed that they do).
> 
> -- John
> 
> On 4 Mar 2014, at 05:48, Maruan Sahyoun  wrote:
> 
>> Hi John,
>> 
>> what about just using the platform fonts? If not then Latex uses the URW++ 
>> fonts which were made available under the http://www.latex-project.org/lppl 
>> license. (same fonts are used by Ghostscript). Could check if the license is 
>> fine with ours.
>> 
>> BR
>> Maruan Sahyoun
>> 
>> Am 03.03.2014 um 21:20 schrieb John Hewson :
>> 
>>> Hi All
>>> 
>>> I wanted to bring PDFBOX-1959 to the attention of the mailing list. PDFBox 
>>> is ready to leave AWT font rendering behind as the JDKs rendering has 
>>> proven to be buggy and we now have our own renderers for all font types in 
>>> 2.0.0.
>>> 
>>> Before we can do this we need to ship a set of standard 14 fonts with 
>>> PDFBox as currently the system fonts are being used via AWT. We also need 
>>> to provide a mechanism for the user to supply their own external fonts for 
>>> cases where embedded fonts are missing. 
>>> 
>>> The main question is, what fonts should we ship? Some of the "free" fonts 
>>> I've seen render very poorly, any suggestions? Furthermore, are there fonts 
>>> under more restrictive licenses which we could ship? Apache does allow for 
>>> such files to be part of a project under certain conditions.
>>> 
>>> Also: Adobe has some font packs, e.g. Japanese, which we could point users 
>>> towards.
>>> 
>>> Cheers
>>> 
>>> -- John
>> 
> 



[jira] [Updated] (PDFBOX-1709) processEncodedText gives wrong coordinates

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1709:


Summary: processEncodedText gives wrong coordinates  (was: 
processEncodedText gives x-coord short by width of previous text, for next text 
at same y-coord.)

> processEncodedText gives wrong coordinates
> --
>
> Key: PDFBOX-1709
> URL: https://issues.apache.org/jira/browse/PDFBOX-1709
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.2
> Environment: Windows 7 sp1, Javac 1.6.0_30, Java 1.7.0_17
>Reporter: Robert Simms
>  Labels: test
> Attachments: PDFBOX1709-0.pdf, PDFBOX1709-1.pdf, PDFBOX1709-2.pdf
>
>
> processEncodedText gives x-coord short by width of previous text, for next 
> text at same y-coord.
> ---
> Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
> processEncodedText().
> %!
> /Helvetica findfont 20 scalefont setfont
> 100 72 moveto
> (Hello) show
> % CASES
> %Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
> or acrobat distiller),
> %then process the PDF with java implementation of PDFBox PDFTextStripper.
> %listing text and x,y positions obtained by overriding the 
> processEncodedText() method.
> %For example, the x-coord. of a text item may be printed in that method 
> with
> %   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());
> % % 0. Works to convince processEncodedText that string 'Hello world.' was at 
> 100,72.  This is good.
> %
> % ( world.) show
> % % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' 
> ' + 'world.'
> % %Instead,
> % %x-coord. of 'world.' reported as being actual position minus width of 
> 'Hello', plus width of ' '
> % %which is x=105.56 in this case.
> % 
> %( ) stringwidth pop 0 rmoveto
> %(world.) show
> % % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
> vertical position causes
> % %processEncodedText to give
> % %x-coord. of 'world.' as actual position minus width of 'Hello'
> % %which is x=200 in this case.
> %
> %100 0 rmoveto
> %(world.) show
> showpage



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1709) processEncodedText gives x-coord short by width of previous text, for next text at same y-coord.

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1709:


Description: 
processEncodedText gives x-coord short by width of previous text, for next text 
at same y-coord.

---

Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
processEncodedText().

%!
/Helvetica findfont 20 scalefont setfont

100 72 moveto
(Hello) show

% CASES
%Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
or acrobat distiller),
%then process the PDF with java implementation of PDFBox PDFTextStripper.
%listing text and x,y positions obtained by overriding the 
processEncodedText() method.
%For example, the x-coord. of a text item may be printed in that method with
%   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());

% % 0. Works to convince processEncodedText that string 'Hello world.' was at 
100,72.  This is good.
%
% ( world.) show

% % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' ' 
+ 'world.'
% %Instead,
% %x-coord. of 'world.' reported as being actual position minus width of 
'Hello', plus width of ' '
% %which is x=105.56 in this case.
% 
%( ) stringwidth pop 0 rmoveto
%(world.) show

% % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
vertical position causes
% %processEncodedText to give
% %x-coord. of 'world.' as actual position minus width of 'Hello'
% %which is x=200 in this case.
%
%100 0 rmoveto
%(world.) show


showpage


  was:
Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
processEncodedText().

%!
/Helvetica findfont 20 scalefont setfont

100 72 moveto
(Hello) show

% CASES
%Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
or acrobat distiller),
%then process the PDF with java implementation of PDFBox PDFTextStripper.
%listing text and x,y positions obtained by overriding the 
processEncodedText() method.
%For example, the x-coord. of a text item may be printed in that method with
%   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());

% % 0. Works to convince processEncodedText that string 'Hello world.' was at 
100,72.  This is good.
%
% ( world.) show

% % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' ' 
+ 'world.'
% %Instead,
% %x-coord. of 'world.' reported as being actual position minus width of 
'Hello', plus width of ' '
% %which is x=105.56 in this case.
% 
%( ) stringwidth pop 0 rmoveto
%(world.) show

% % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
vertical position causes
% %processEncodedText to give
% %x-coord. of 'world.' as actual position minus width of 'Hello'
% %which is x=200 in this case.
%
%100 0 rmoveto
%(world.) show


showpage



> processEncodedText gives x-coord short by width of previous text, for next 
> text at same y-coord.
> 
>
> Key: PDFBOX-1709
> URL: https://issues.apache.org/jira/browse/PDFBOX-1709
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.2
> Environment: Windows 7 sp1, Javac 1.6.0_30, Java 1.7.0_17
>Reporter: Robert Simms
>  Labels: test
> Attachments: PDFBOX1709-0.pdf, PDFBOX1709-1.pdf, PDFBOX1709-2.pdf
>
>
> processEncodedText gives x-coord short by width of previous text, for next 
> text at same y-coord.
> ---
> Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
> processEncodedText().
> %!
> /Helvetica findfont 20 scalefont setfont
> 100 72 moveto
> (Hello) show
> % CASES
> %Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
> or acrobat distiller),
> %then process the PDF with java implementation of PDFBox PDFTextStripper.
> %listing text and x,y positions obtained by overriding the 
> processEncodedText() method.
> %For example, the x-coord. of a text item may be printed in that method 
> with
> %   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());
> % % 0. Works to convince processEncodedText that string 'Hello world.' was at 
> 100,72.  This is good.
> %
> % ( world.) show
> % % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' 
> ' + 'world.'
> % %Instead,
> % %x-coord. of 'world.' reported as being actual position minus width of 
> 'Hello', plus width of ' '
> % %which is x=105.56 in this case.
> % 
> %( ) stringwidth pop 0 rmoveto
> %(world.) show
> % % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
> vertical position causes
> % %processEncodedText to give
> % %x-coord. of 'world.' as actual position minus width of '

[jira] [Updated] (PDFBOX-1709) processEncodedText gives wrong coordinates

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1709:


Description: 
PDFStreamEngine#processEncodedText gives x-coord short by width of previous 
text, for next text at same y-coord.

---

Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
processEncodedText().

%!
/Helvetica findfont 20 scalefont setfont

100 72 moveto
(Hello) show

% CASES
%Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
or acrobat distiller),
%then process the PDF with java implementation of PDFBox PDFTextStripper.
%listing text and x,y positions obtained by overriding the 
processEncodedText() method.
%For example, the x-coord. of a text item may be printed in that method with
%   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());

% % 0. Works to convince processEncodedText that string 'Hello world.' was at 
100,72.  This is good.
%
% ( world.) show

% % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' ' 
+ 'world.'
% %Instead,
% %x-coord. of 'world.' reported as being actual position minus width of 
'Hello', plus width of ' '
% %which is x=105.56 in this case.
% 
%( ) stringwidth pop 0 rmoveto
%(world.) show

% % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
vertical position causes
% %processEncodedText to give
% %x-coord. of 'world.' as actual position minus width of 'Hello'
% %which is x=200 in this case.
%
%100 0 rmoveto
%(world.) show


showpage


  was:
processEncodedText gives x-coord short by width of previous text, for next text 
at same y-coord.

---

Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
processEncodedText().

%!
/Helvetica findfont 20 scalefont setfont

100 72 moveto
(Hello) show

% CASES
%Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
or acrobat distiller),
%then process the PDF with java implementation of PDFBox PDFTextStripper.
%listing text and x,y positions obtained by overriding the 
processEncodedText() method.
%For example, the x-coord. of a text item may be printed in that method with
%   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());

% % 0. Works to convince processEncodedText that string 'Hello world.' was at 
100,72.  This is good.
%
% ( world.) show

% % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' ' 
+ 'world.'
% %Instead,
% %x-coord. of 'world.' reported as being actual position minus width of 
'Hello', plus width of ' '
% %which is x=105.56 in this case.
% 
%( ) stringwidth pop 0 rmoveto
%(world.) show

% % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
vertical position causes
% %processEncodedText to give
% %x-coord. of 'world.' as actual position minus width of 'Hello'
% %which is x=200 in this case.
%
%100 0 rmoveto
%(world.) show


showpage



> processEncodedText gives wrong coordinates
> --
>
> Key: PDFBOX-1709
> URL: https://issues.apache.org/jira/browse/PDFBOX-1709
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.2
> Environment: Windows 7 sp1, Javac 1.6.0_30, Java 1.7.0_17
>Reporter: Robert Simms
>  Labels: test
> Attachments: PDFBOX1709-0.pdf, PDFBOX1709-1.pdf, PDFBOX1709-2.pdf
>
>
> PDFStreamEngine#processEncodedText gives x-coord short by width of previous 
> text, for next text at same y-coord.
> ---
> Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
> processEncodedText().
> %!
> /Helvetica findfont 20 scalefont setfont
> 100 72 moveto
> (Hello) show
> % CASES
> %Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
> or acrobat distiller),
> %then process the PDF with java implementation of PDFBox PDFTextStripper.
> %listing text and x,y positions obtained by overriding the 
> processEncodedText() method.
> %For example, the x-coord. of a text item may be printed in that method 
> with
> %   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());
> % % 0. Works to convince processEncodedText that string 'Hello world.' was at 
> 100,72.  This is good.
> %
> % ( world.) show
> % % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' 
> ' + 'world.'
> % %Instead,
> % %x-coord. of 'world.' reported as being actual position minus width of 
> 'Hello', plus width of ' '
> % %which is x=105.56 in this case.
> % 
> %( ) stringwidth pop 0 rmoveto
> %(world.) show
> % % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
> vertical position causes
> % %processEncodedText to give
> % %x-coord. of 'world.' as actual 

[jira] [Commented] (PDFBOX-1709) processEncodedText gives wrong coordinates

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919849#comment-13919849
 ] 

John Hewson commented on PDFBOX-1709:
-

{quote}
is not handling horizontal gaps between characters consistently.
{quote}

That's rather subjective, what result did you expect, exactly?

> processEncodedText gives wrong coordinates
> --
>
> Key: PDFBOX-1709
> URL: https://issues.apache.org/jira/browse/PDFBOX-1709
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.2
> Environment: Windows 7 sp1, Javac 1.6.0_30, Java 1.7.0_17
>Reporter: Robert Simms
>  Labels: test
> Attachments: PDFBOX1709-0.pdf, PDFBOX1709-1.pdf, PDFBOX1709-2.pdf
>
>
> PDFStreamEngine#processEncodedText gives x-coord short by width of previous 
> text, for next text at same y-coord.
> ---
> Use this PostScript to create PDFs that demonstrate x-coordinate issue with 
> processEncodedText().
> %!
> /Helvetica findfont 20 scalefont setfont
> 100 72 moveto
> (Hello) show
> % CASES
> %Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, 
> or acrobat distiller),
> %then process the PDF with java implementation of PDFBox PDFTextStripper.
> %listing text and x,y positions obtained by overriding the 
> processEncodedText() method.
> %For example, the x-coord. of a text item may be printed in that method 
> with
> %   System.out.format("%.2f\n", this.getTextMatrix().getXPosition());
> % % 0. Works to convince processEncodedText that string 'Hello world.' was at 
> 100,72.  This is good.
> %
> % ( world.) show
> % % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' 
> ' + 'world.'
> % %Instead,
> % %x-coord. of 'world.' reported as being actual position minus width of 
> 'Hello', plus width of ' '
> % %which is x=105.56 in this case.
> % 
> %( ) stringwidth pop 0 rmoveto
> %(world.) show
> % % 2. Positioning 'world.' within about 500 points from 'Hello', at same 
> vertical position causes
> % %processEncodedText to give
> % %x-coord. of 'world.' as actual position minus width of 'Hello'
> % %which is x=200 in this case.
> %
> %100 0 rmoveto
> %(world.) show
> showpage



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919857#comment-13919857
 ] 

John Hewson commented on PDFBOX-52:
---

I don't understand the change you've made in to 1.8 1574180. JPEG handling is 
already implemented in PDJpeg and it has special handing for reading and 
writing DCTFilter streams. The only thing missing was that this code is not in 
a logical place, it should be in DCTFilter which it is now in 2.0.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919857#comment-13919857
 ] 

John Hewson edited comment on PDFBOX-52 at 3/4/14 8:01 PM:
---

I don't understand the change you've made to 1.8 in 1574180. JPEG handling is 
already implemented in PDJpeg and it has special handing for reading and 
writing DCTFilter streams. The only thing missing was that this code is not in 
a logical place, it should be in DCTFilter which it is now in 2.0.


was (Author: jahewson):
I don't understand the change you've made in to 1.8 1574180. JPEG handling is 
already implemented in PDJpeg and it has special handing for reading and 
writing DCTFilter streams. The only thing missing was that this code is not in 
a logical place, it should be in DCTFilter which it is now in 2.0.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919864#comment-13919864
 ] 

Tilman Hausherr commented on PDFBOX-52:
---

Feel free to do what you think is better / best.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919857#comment-13919857
 ] 

John Hewson edited comment on PDFBOX-52 at 3/4/14 8:05 PM:
---

I don't understand the change you've made to 1.8 in 1574180. JPEG handling is 
already implemented in PDJpeg and it has special handing for reading and 
writing DCTFilter streams. The only thing missing was that this code is not in 
a logical place, it should be in DCTFilter which it is now in 2.0.

Stream handling in 1.8 is special for JPEGs, the DCTFilter is never used and 
should not be used.


was (Author: jahewson):
I don't understand the change you've made to 1.8 in 1574180. JPEG handling is 
already implemented in PDJpeg and it has special handing for reading and 
writing DCTFilter streams. The only thing missing was that this code is not in 
a logical place, it should be in DCTFilter which it is now in 2.0.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1931) Radial shading is missing

2014-03-04 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1931:


Attachment: pdfbox-1931.pdf-1.png

The shading works with the 1.8 branch (although with the wrong color).

> Radial shading is missing
> -
>
> Key: PDFBOX-1931
> URL: https://issues.apache.org/jira/browse/PDFBOX-1931
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
>  Labels: shading, shadingpattern
> Attachments: pdfbox-1931.pdf-1.png, uniekekans_test.pdf
>
>
> The attached file contains a radial shading fill which is missing. 
> RadialShadingContext#calculateInputValues is returning NaN which seems to be 
> incorrect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Remove AWT Fonts

2014-03-04 Thread John Hewson
Maruan

> what I was having in mind is something similar to Apache FOP’s auto detect 
> feature for fonts.

Yeah, this looks good, we could use this for finding missing embedded fonts.

> For inclusion these are some additional candidates
> 
> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
> Croscore fonts 
> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts

Great, I’ll take a look.

> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
> allow people to use their own will help us in the long run as if the quality 
> is not inline with the ones used by Adobe Reader there will be additional 
> questions/issues/bug reports we are not able to resolve.

We still need to ship a set of standard 14 fonts to solve the problems with 
platforms which don’t
have these fonts or have poor quality substitutes. The ideal solution is to 
bundle our own high
quality fonts and not depend on proprietary, platform-specific fonts. If we 
can’t do this for some
reason (e.g. quality), then we can reluctantly make use of platform fonts.

-- John

On 4 Mar 2014, at 11:45, Maruan Sahyoun  wrote:

> Hi John,
> 
> what I was having in mind is something similar to Apache FOP’s auto detect 
> feature for fonts.
> 
> doc: https://xmlgraphics.apache.org/fop/1.1/fonts.html
> code: 
> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/fonts/autodetect/
> 
> Fo inclusion these are some additional candidates
> 
> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
> Croscore fonts 
> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts
> 
> 
> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
> allow people to use their own will help us in the long run as if the quality 
> is not inline with the ones used by Adobe Reader there will be additional 
> questions/issues/bug reports we are not able to resolve.
> 
> BR
> 
> Maruan Sahyoun
> 
> Am 04.03.2014 um 19:34 schrieb John Hewson :
> 
>> Hi Maruan
>> 
>> Java provides access to platform fonts via AWT and does not reveal the paths 
>> to the fonts
>> which it finds, so it is not practical to use platform fonts without using 
>> AWT. There have also
>> been a number of problems with some unix platforms which lack some of the 
>> standard 14
>> fonts or which ship with poor quality substitutes. Ideally, PDFBox should 
>> produce the same
>> result irrespective of which platform it is running on, much like Adobe 
>> Reader (excluding any
>> missing embedded fonts, of course).
>> 
>> I’ve had poor experiences in the past with the Nimbus family of fonts from 
>> URW++ but there
>> are numerous factors (kerning, hinting, metrics, TTF vs Type 1) which may 
>> have changed since
>> then. We should check out how well these fonts compare with the standard 14 
>> used by Adobe,
>> in particular whether or not the metrics actually match (I know that it is 
>> claimed that they do).
>> 
>> -- John
>> 
>> On 4 Mar 2014, at 05:48, Maruan Sahyoun  wrote:
>> 
>>> Hi John,
>>> 
>>> what about just using the platform fonts? If not then Latex uses the URW++ 
>>> fonts which were made available under the http://www.latex-project.org/lppl 
>>> license. (same fonts are used by Ghostscript). Could check if the license 
>>> is fine with ours.
>>> 
>>> BR
>>> Maruan Sahyoun
>>> 
>>> Am 03.03.2014 um 21:20 schrieb John Hewson :
>>> 
 Hi All
 
 I wanted to bring PDFBOX-1959 to the attention of the mailing list. PDFBox 
 is ready to leave AWT font rendering behind as the JDKs rendering has 
 proven to be buggy and we now have our own renderers for all font types in 
 2.0.0.
 
 Before we can do this we need to ship a set of standard 14 fonts with 
 PDFBox as currently the system fonts are being used via AWT. We also need 
 to provide a mechanism for the user to supply their own external fonts for 
 cases where embedded fonts are missing. 
 
 The main question is, what fonts should we ship? Some of the "free" fonts 
 I've seen render very poorly, any suggestions? Furthermore, are there 
 fonts under more restrictive licenses which we could ship? Apache does 
 allow for such files to be part of a project under certain conditions.
 
 Also: Adobe has some font packs, e.g. Japanese, which we could point users 
 towards.
 
 Cheers
 
 -- John
>>> 
>> 
> 



[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919891#comment-13919891
 ] 

John Hewson commented on PDFBOX-52:
---

I don't see that there is anything that can be done in the 1.8 branch. The 
functionality of DCTFilter lives in PDJpeg and DCTFilter isn't used. The 
changes in 1574180 risk breaking things or confusing users who should be using 
PDJpeg, so should probably be reversed. Sorry!

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Remove AWT Fonts

2014-03-04 Thread Maruan Sahyoun
John,

I don’t understand why we do have to ship fonts. We didn’t ship fonts until now 
but were dependent on platform fonts through AWT. So the situation won’t 
change. 

For legal reasons we won’t be able to use the fonts Adobe uses and I doubt that 
there are open source fonts which provide the same results. (rendering quality, 
number of glyphs ….) so I think a mechanism to use platform fonts and letting 
users register new ones similar to our current font aliases is a better and 
more reliable option. 

BR
Maruan Sahyoun

Am 04.03.2014 um 21:28 schrieb John Hewson :

> Maruan
> 
>> what I was having in mind is something similar to Apache FOP’s auto detect 
>> feature for fonts.
> 
> Yeah, this looks good, we could use this for finding missing embedded fonts.
> 
>> For inclusion these are some additional candidates
>> 
>> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
>> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
>> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
>> Croscore fonts 
>> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts
> 
> Great, I’ll take a look.
> 
>> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
>> allow people to use their own will help us in the long run as if the quality 
>> is not inline with the ones used by Adobe Reader there will be additional 
>> questions/issues/bug reports we are not able to resolve.
> 
> We still need to ship a set of standard 14 fonts to solve the problems with 
> platforms which don’t
> have these fonts or have poor quality substitutes. The ideal solution is to 
> bundle our own high
> quality fonts and not depend on proprietary, platform-specific fonts. If we 
> can’t do this for some
> reason (e.g. quality), then we can reluctantly make use of platform fonts.
> 
> -- John
> 
> On 4 Mar 2014, at 11:45, Maruan Sahyoun  wrote:
> 
>> Hi John,
>> 
>> what I was having in mind is something similar to Apache FOP’s auto detect 
>> feature for fonts.
>> 
>> doc: https://xmlgraphics.apache.org/fop/1.1/fonts.html
>> code: 
>> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/fonts/autodetect/
>> 
>> Fo inclusion these are some additional candidates
>> 
>> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
>> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
>> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
>> Croscore fonts 
>> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts
>> 
>> 
>> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
>> allow people to use their own will help us in the long run as if the quality 
>> is not inline with the ones used by Adobe Reader there will be additional 
>> questions/issues/bug reports we are not able to resolve.
>> 
>> BR
>> 
>> Maruan Sahyoun
>> 
>> Am 04.03.2014 um 19:34 schrieb John Hewson :
>> 
>>> Hi Maruan
>>> 
>>> Java provides access to platform fonts via AWT and does not reveal the 
>>> paths to the fonts
>>> which it finds, so it is not practical to use platform fonts without using 
>>> AWT. There have also
>>> been a number of problems with some unix platforms which lack some of the 
>>> standard 14
>>> fonts or which ship with poor quality substitutes. Ideally, PDFBox should 
>>> produce the same
>>> result irrespective of which platform it is running on, much like Adobe 
>>> Reader (excluding any
>>> missing embedded fonts, of course).
>>> 
>>> I’ve had poor experiences in the past with the Nimbus family of fonts from 
>>> URW++ but there
>>> are numerous factors (kerning, hinting, metrics, TTF vs Type 1) which may 
>>> have changed since
>>> then. We should check out how well these fonts compare with the standard 14 
>>> used by Adobe,
>>> in particular whether or not the metrics actually match (I know that it is 
>>> claimed that they do).
>>> 
>>> -- John
>>> 
>>> On 4 Mar 2014, at 05:48, Maruan Sahyoun  wrote:
>>> 
 Hi John,
 
 what about just using the platform fonts? If not then Latex uses the URW++ 
 fonts which were made available under the 
 http://www.latex-project.org/lppl license. (same fonts are used by 
 Ghostscript). Could check if the license is fine with ours.
 
 BR
 Maruan Sahyoun
 
 Am 03.03.2014 um 21:20 schrieb John Hewson :
 
> Hi All
> 
> I wanted to bring PDFBOX-1959 to the attention of the mailing list. 
> PDFBox is ready to leave AWT font rendering behind as the JDKs rendering 
> has proven to be buggy and we now have our own renderers for all font 
> types in 2.0.0.
> 
> Before we can do this we need to ship a set of standard 14 fonts with 
> PDFBox as currently the system fonts are being used via AWT. We also need 
> to provide a mechanism for the user to supply their own external fonts 
> for cases where embedded fonts are missing. 
> 
> The main question is, wh

[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919981#comment-13919981
 ] 

Tilman Hausherr commented on PDFBOX-52:
---

Prove it by showing a case that is broken by my change.
I could of course simply add the stuff that is in JPXFilter into DCTFilter, but 
why have something twice?

The whole 1.8 branch is broken compared to the trunk. I just rendered all my 
images with the 1.8 branch, the result is terrible. Although there are a few 
cases where I get better results with 1.8.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Remove AWT Fonts

2014-03-04 Thread John Hewson
We have open issues due to missing or buggy platform fonts for the standard 14 
set.
Continuing to use platform fonts will not solve this problem.

We should try to find some good open source fonts, if they are not good enough 
then
we can look at other options, but we should at least try.

-- John

On 4 Mar 2014, at 12:38, Maruan Sahyoun  wrote:

> John,
> 
> I don’t understand why we do have to ship fonts. We didn’t ship fonts until 
> now but were dependent on platform fonts through AWT. So the situation won’t 
> change. 
> 
> For legal reasons we won’t be able to use the fonts Adobe uses and I doubt 
> that there are open source fonts which provide the same results. (rendering 
> quality, number of glyphs ….) so I think a mechanism to use platform fonts 
> and letting users register new ones similar to our current font aliases is a 
> better and more reliable option. 
> 
> BR
> Maruan Sahyoun
> 
> Am 04.03.2014 um 21:28 schrieb John Hewson :
> 
>> Maruan
>> 
>>> what I was having in mind is something similar to Apache FOP’s auto detect 
>>> feature for fonts.
>> 
>> Yeah, this looks good, we could use this for finding missing embedded fonts.
>> 
>>> For inclusion these are some additional candidates
>>> 
>>> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
>>> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
>>> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
>>> Croscore fonts 
>>> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts
>> 
>> Great, I’ll take a look.
>> 
>>> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
>>> allow people to use their own will help us in the long run as if the 
>>> quality is not inline with the ones used by Adobe Reader there will be 
>>> additional questions/issues/bug reports we are not able to resolve.
>> 
>> We still need to ship a set of standard 14 fonts to solve the problems with 
>> platforms which don’t
>> have these fonts or have poor quality substitutes. The ideal solution is to 
>> bundle our own high
>> quality fonts and not depend on proprietary, platform-specific fonts. If we 
>> can’t do this for some
>> reason (e.g. quality), then we can reluctantly make use of platform fonts.
>> 
>> -- John
>> 
>> On 4 Mar 2014, at 11:45, Maruan Sahyoun  wrote:
>> 
>>> Hi John,
>>> 
>>> what I was having in mind is something similar to Apache FOP’s auto detect 
>>> feature for fonts.
>>> 
>>> doc: https://xmlgraphics.apache.org/fop/1.1/fonts.html
>>> code: 
>>> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/fonts/autodetect/
>>> 
>>> Fo inclusion these are some additional candidates
>>> 
>>> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
>>> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
>>> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
>>> Croscore fonts 
>>> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts
>>> 
>>> 
>>> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
>>> allow people to use their own will help us in the long run as if the 
>>> quality is not inline with the ones used by Adobe Reader there will be 
>>> additional questions/issues/bug reports we are not able to resolve.
>>> 
>>> BR
>>> 
>>> Maruan Sahyoun
>>> 
>>> Am 04.03.2014 um 19:34 schrieb John Hewson :
>>> 
 Hi Maruan
 
 Java provides access to platform fonts via AWT and does not reveal the 
 paths to the fonts
 which it finds, so it is not practical to use platform fonts without using 
 AWT. There have also
 been a number of problems with some unix platforms which lack some of the 
 standard 14
 fonts or which ship with poor quality substitutes. Ideally, PDFBox should 
 produce the same
 result irrespective of which platform it is running on, much like Adobe 
 Reader (excluding any
 missing embedded fonts, of course).
 
 I’ve had poor experiences in the past with the Nimbus family of fonts from 
 URW++ but there
 are numerous factors (kerning, hinting, metrics, TTF vs Type 1) which may 
 have changed since
 then. We should check out how well these fonts compare with the standard 
 14 used by Adobe,
 in particular whether or not the metrics actually match (I know that it is 
 claimed that they do).
 
 -- John
 
 On 4 Mar 2014, at 05:48, Maruan Sahyoun  wrote:
 
> Hi John,
> 
> what about just using the platform fonts? If not then Latex uses the 
> URW++ fonts which were made available under the 
> http://www.latex-project.org/lppl license. (same fonts are used by 
> Ghostscript). Could check if the license is fine with ours.
> 
> BR
> Maruan Sahyoun
> 
> Am 03.03.2014 um 21:20 schrieb John Hewson :
> 
>> Hi All
>> 
>> I wanted to bring PDFBOX-1959 to the attention of the mailing list. 
>> PDFBox is read

[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920018#comment-13920018
 ] 

John Hewson commented on PDFBOX-52:
---

{quote}
Prove it by showing a case that is broken by my change.
{quote}

This is the wrong mindset, 1.8 is stable, you need to prove that your fix was 
needed and show what it fixes, otherwise it is by definition an unnecessary 
risk.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920018#comment-13920018
 ] 

John Hewson edited comment on PDFBOX-52 at 3/4/14 9:06 PM:
---

{quote}
Prove it by showing a case that is broken by my change.
{quote}

This is the wrong mindset, 1.8 is stable, you need to prove that your fix was 
needed and show what it fixes, otherwise it is by definition an unnecessary 
risk. Adding a redundant and possibly broken method to read JPEG files which is 
never actually used doesn't seem like something which should go into a stable 
release.


was (Author: jahewson):
{quote}
Prove it by showing a case that is broken by my change.
{quote}

This is the wrong mindset, 1.8 is stable, you need to prove that your fix was 
needed and show what it fixes, otherwise it is by definition an unnecessary 
risk.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920051#comment-13920051
 ] 

Tilman Hausherr commented on PDFBOX-52:
---

My fix was needed because the DCT image wasn't decoded since 2005. The fix 
decodes the image. The previous code did call the DCT filter for DCT encoded 
images and got an exception that it is not implemented. The new code uses the 
JPX filter instead, which is nothing more than a call to java imageio. It ran 
on my system for 2 months. Your theory is a rather philosophical one. Or 
rather, it is an opinion.

This discussion is going into a bad direction. One that feels like wikipedia. 
With potentially endless discussions (that keeps me from writing and testing 
code) and the result depends on who has more time. I don't have a personal 
interest in that change because I use the 2.0 version. I did it for the user 
who didn't understand that the 1.8 version didn't have DCT.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920099#comment-13920099
 ] 

John Hewson commented on PDFBOX-52:
---

{quote}
My fix was needed because the DCT image wasn't decoded since 2005.
{quote}

Right but that's because DCT handling is done by PDJpeg, it was a design choice 
(not a great one). It's not a bug or a missing feature. Now the confused user 
is even more confused because they should be using PDJpeg directly rather than 
messing with DCTFilter.

I just don't think we should be wasting time fixing 9 year old issues which 
don't need fixing.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920113#comment-13920113
 ] 

Tilman Hausherr commented on PDFBOX-52:
---

I don't get it. The user wouldn't have used DCTFilter anyway, because that one 
wasn't implemented.

"9 year old issues which don't need fixing" ? Of course it needed fixing. 
Previously the image couldn't be decoded, now it can. I've asked the guy on the 
user list to send his image.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920123#comment-13920123
 ] 

John Hewson commented on PDFBOX-52:
---

Ah, I misunderstood what was going on, now I see it's an issue with inline 
images making use of DCTFilter. Normally DCTFilter is never actually called by 
PDFBox code, as DCT handling is done by PDJpeg. However, PDJpeg is an XObject 
subclass so it isn't being instantiated for inline images. I was under the 
impression that DCTFilter was never called, but obviously it is. As a quick 
hack your fix seems perfectly reasonable, apologies for my misunderstanding :)

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920018#comment-13920018
 ] 

John Hewson edited comment on PDFBOX-52 at 3/4/14 10:29 PM:


{quote}
Prove it by showing a case that is broken by my change.
{quote}

-This is the wrong mindset, 1.8 is stable, you need to prove that your fix was 
needed and show what it fixes, otherwise it is by definition an unnecessary 
risk. Adding a redundant and possibly broken method to read JPEG files which is 
never actually used doesn't seem like something which should go into a stable 
release.-


was (Author: jahewson):
{quote}
Prove it by showing a case that is broken by my change.
{quote}

This is the wrong mindset, 1.8 is stable, you need to prove that your fix was 
needed and show what it fixes, otherwise it is by definition an unnecessary 
risk. Adding a redundant and possibly broken method to read JPEG files which is 
never actually used doesn't seem like something which should go into a stable 
release.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920099#comment-13920099
 ] 

John Hewson edited comment on PDFBOX-52 at 3/4/14 10:29 PM:


{quote}
My fix was needed because the DCT image wasn't decoded since 2005.
{quote}

Right but that's because DCT handling is done by PDJpeg, it was a design choice 
(not a great one). -It's not a bug or a missing feature. Now the confused user 
is even more confused because they should be using PDJpeg directly rather than 
messing with DCTFilter.

I just don't think we should be wasting time fixing 9 year old issues which 
don't need fixing.-


was (Author: jahewson):
{quote}
My fix was needed because the DCT image wasn't decoded since 2005.
{quote}

Right but that's because DCT handling is done by PDJpeg, it was a design choice 
(not a great one). It's not a bug or a missing feature. Now the confused user 
is even more confused because they should be using PDJpeg directly rather than 
messing with DCTFilter.

I just don't think we should be wasting time fixing 9 year old issues which 
don't need fixing.

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-52) DCTFilter is not implemented yet

2014-03-04 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920099#comment-13920099
 ] 

John Hewson edited comment on PDFBOX-52 at 3/4/14 10:29 PM:


{quote}
My fix was needed because the DCT image wasn't decoded since 2005.
{quote}

Right but that's because DCT handling is done by PDJpeg, it was a design choice 
(not a great one). -It's not a bug or a missing feature. Now the confused user 
is even more confused because they should be using PDJpeg directly rather than 
messing with DCTFilter.-

-I just don't think we should be wasting time fixing 9 year old issues which 
don't need fixing.-


was (Author: jahewson):
{quote}
My fix was needed because the DCT image wasn't decoded since 2005.
{quote}

Right but that's because DCT handling is done by PDJpeg, it was a design choice 
(not a great one). -It's not a bug or a missing feature. Now the confused user 
is even more confused because they should be using PDJpeg directly rather than 
messing with DCTFilter.

I just don't think we should be wasting time fixing 9 year old issues which 
don't need fixing.-

> DCTFilter is not implemented yet
> 
>
> Key: PDFBOX-52
> URL: https://issues.apache.org/jira/browse/PDFBOX-52
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Rendering
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: FilterManager.patch, JPXFilter.patch, 
> amyuni2_05d__pdf1_3_acro4x.pdf-1.png, dctfilter.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1181506
> Originally submitted by joswol on 2005-04-12 07:03.
> PDFBox-0.7.1: org.pdfbox.filter.DCTFilter  - Warning:
> DCTFilter.decode is not implemented yet, skipping this
> stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)
John Hewson created PDFBOX-1960:
---

 Summary: Matrix is wrong
 Key: PDFBOX-1960
 URL: https://issues.apache.org/jira/browse/PDFBOX-1960
 Project: PDFBox
  Issue Type: Bug
Reporter: John Hewson
Priority: Critical


I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java: AffineTransform[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
AffineTransform[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox: [[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] * 
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] = 
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1960:


Affects Version/s: 2.0.0
   1.8.4

> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
> expected, here's an example from a pattern I'm working on. I performed the 
> same concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java: AffineTransform[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> AffineTransform[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox: [[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] * 
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] = 
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1960:


Description: 
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.

  was:
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java: AffineTransform[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
AffineTransform[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox: [[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.


> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
> expected, here's an example from a pattern I'm working on. I performed the 
> same concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1960:


Description: 
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.

  was:
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.


> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
> expected, here's an example from a pattern I'm working on. I performed the 
> same concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1960:


Description: 
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java: AffineTransform[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
AffineTransform[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox: [[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.

  was:
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java: AffineTransform[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
AffineTransform[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox: [[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] * 
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] = 
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.


> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
> expected, here's an example from a pattern I'm working on. I performed the 
> same concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java: AffineTransform[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> AffineTransform[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox: [[2.0,-0.0,0.0][0.0,2.0,0.0][1.2505552E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1960:


Description: 
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.

  was:
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that {{org.apache.pdfbox.util.Matrix}} is not behaving as 
expected, here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.


> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
> here's an example from a pattern I'm working on. I performed the same 
> concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1960:


Description: 
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.

  was:
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.600014,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.


> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
> here's an example from a pattern I'm working on. I performed the same 
> concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1960:


Description: 
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.6,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.

  was:
I've been driven insane recently by trying to get pattern fills to render 
correctly. Patterns have their own matrix which is concatenated to the CTM and 
no matter how I applied the transformation, the results were wrong.

It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
here's an example from a pattern I'm working on. I performed the same 
concatenation (i.e. multiplication) using our Matrix and Java's 
AffineTransform, the results are as follows:

Java AffineTransform:
[[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
[[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]

PDFBox Matrix:
[[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
[[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
[[0.6,0.0,0.0][0.0,0.6,0.0][302.608,1091.38,1.0]]

I suggest that we remove Matrix and replace it with AffineTransform.


> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
> here's an example from a pattern I'm working on. I performed the same 
> concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.6,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920567#comment-13920567
 ] 

Maruan Sahyoun commented on PDFBOX-1960:


I’m not in favor of replacing Matrix with AffineTransform as one of the goals 
of 2.0 is to have a core which can potentially be ported to Android. As 
AffineTransform is not available on Android we’d need something there too 
anyway. And Matrix is used inside PDFBox not only for drawing purposes. So I’d 
propose to fix Matrix if the results are wrong.

> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
> here's an example from a pattern I'm working on. I performed the same 
> concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.6,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920585#comment-13920585
 ] 

Tilman Hausherr commented on PDFBOX-1960:
-

I'm also not in favor of replacing Matrix, because I believe that there is a 
reason that AffineTransform wasn't used in the first place. The few things I've 
understood about it from reading the spec is that its not really the same. I 
prefer not to have another "no stone unturned" change if there is an 
alternative. My own strategy has always been to convert Matrix locally to an 
AffineTransform.

I also don't understand the example, one has 0.6 the other has 0.3 so its not 
surprising that the result is different.

> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
> here's an example from a pattern I'm working on. I performed the same 
> concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.6,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson reassigned PDFBOX-1960:
---

Assignee: Maruan Sahyoun

Ok Maruan, I've assigned this issue to you so that you can fix the matrix code.

> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Assignee: Maruan Sahyoun
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
> here's an example from a pattern I'm working on. I performed the same 
> concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.6,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1960) Matrix is wrong

2014-03-04 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920604#comment-13920604
 ] 

Maruan Sahyoun commented on PDFBOX-1960:


Hi John, a PDF Matrix and AffineTransform Matrix have some different 
definitions. E.g. in AffineTransform the last row is static 1 0 0 in a PDF 
matrix the last column is static 0 0 1. I haven’t looked at the code yet but 
wanted to ensure that it’s clear that the results (individual numbers) can not 
be taken directly but you have taken the differences into account when 
inspecting/expecting the results.

In order to support the development which you assigned to me could you come up 
with some unit tests for the Matrix operations? Other than showing that the 
results of AffineTransform and Matrix differ I’m not sure I understand what the 
expected result is. 

> Matrix is wrong
> ---
>
> Key: PDFBOX-1960
> URL: https://issues.apache.org/jira/browse/PDFBOX-1960
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 2.0.0
>Reporter: John Hewson
>Assignee: Maruan Sahyoun
>Priority: Critical
>
> I've been driven insane recently by trying to get pattern fills to render 
> correctly. Patterns have their own matrix which is concatenated to the CTM 
> and no matter how I applied the transformation, the results were wrong.
> It turns out that org.apache.pdfbox.util.Matrix is not behaving as expected, 
> here's an example from a pattern I'm working on. I performed the same 
> concatenation (i.e. multiplication) using our Matrix and Java's 
> AffineTransform, the results are as follows:
> Java AffineTransform:
> [[2.0, 0.0, 1.251E-12], [-0.0, 2.0, 1684.0]] *
> [[0.6, 0.0, 302.6], [0.0, 0.6, 1091.38]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][605.21,2856.34,1.0]]
> PDFBox Matrix:
> [[2.0,-0.0,0.0][0.0,2.0,0.0][1.251E-12,1684.0,1.0]] *
> [[0.3,0.0,0.0][0.0,0.37,0.0][302.60,586.17,1.0]] =
> [[0.6,0.0,0.0][0.0,0.6,0.0][302.6,1091.38,1.0]]
> I suggest that we remove Matrix and replace it with AffineTransform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)