Re: [iText-questions] XMLWorker - difference between creating blank pdf and filling out an existing (with a field). (Daniel Lehtihet)

Daniel Lehtihet Mon, 23 Sep 2013 00:43:40 -0700

Hi,

yes, i know this is not a paid support forum but this seems to be a very 
weird behaviour that perhaps needs to be looked into. Why would the exact 
same code behave different when writing to a blank page (with its 
x,y,x1,y1 coordinates) versus writing to a placeholder on the blank page 
(with its x,y,x1 and y1 coordinates)?





From:   itext-questions-requ...@lists.sourceforge.net
To:     itext-questions@lists.sourceforge.net, 
Date:   2013-09-06 16:05
Subject:        iText-questions Digest, Vol 88, Issue 6



Send iText-questions mailing list submissions to
                 itext-questions@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
                 
https://lists.sourceforge.net/lists/listinfo/itext-questions
or, via email, send a message with subject or body 'help' to
                 itext-questions-requ...@lists.sourceforge.net

You can reach the person managing the list at
                 itext-questions-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of iText-questions digest..."


Today's Topics:

   1. XMLWorker - difference between creating blank pdf and filling
      out an existing (with a field). (Daniel Lehtihet)


----------------------------------------------------------------------

Message: 1
Date: Fri, 6 Sep 2013 15:02:47 +0200
From: Daniel Lehtihet <daniel.lehti...@folksam.se>
Subject: [iText-questions] XMLWorker - difference between creating
                 blank pdf and filling out an existing (with a field).
To: itext-questions@lists.sourceforge.net
Message-ID:
 
<of828a9458.5dd14d7e-onc1257bde.00467582-c1257bde.0047a...@intern.folksam.se>
 
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I have a question regarding how different the output can be when 
transforming xhtml (using xmlworker) using either:

a) a new blank PDF which one creates and fills out 

and

b) using an existing pdf using a field as placeholder (well, actually 
outputting the result within the field limits)


when using the "blank" route, the html displays just fine. When using the 
"existing pdf" route, some xhtml looks very strange (overlapping text).

Let me show you an example of what i mean. I have a class that have two 
signatures. One produces a pdf named "outputGen1" (here you will se the 
headline text overlap). The other produces a pddf named "outputGen2" (and 
here it looks just fine).

Code for "Gen":

package se.folksam.test;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

import org.w3c.tidy.Tidy;

import com.itextpdf.text.Chunk;
import com.itextpdf.text.Document;
import com.itextpdf.text.Element;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.AcroFields.FieldPosition;
import com.itextpdf.text.pdf.ColumnText;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.Pipeline;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;

public class Gen {

 
        /**
         * @param args
         */
        public static void main(String[] args) {
                Gen g = new Gen();
                g.gen1();
                g.gen2();

        }

 
        public void gen1() {
                try {
 
                        FontFactory.registerDirectories();
                        ByteArrayOutputStream baos = new 
ByteArrayOutputStream();
                        PdfReader reader = new PdfReader(
"c:/temp/Huge_Text_Field.pdf");
                        PdfStamper stp = new PdfStamper(reader, baos);
 
                        AcroFields af = stp.getAcroFields();
 
                        String html = readFile("c:/temp/markup.html");
 
                        // Use JTidy to force html to xhtml
                        Tidy tidy = new Tidy();
                        tidy.setMakeClean(true);
                        tidy.setXHTML(true);
                        tidy.setBreakBeforeBR(false);
                        tidy.setShowWarnings(false);
 
                        ByteArrayOutputStream os = new 
ByteArrayOutputStream();
                        InputStream is = new 
ByteArrayInputStream(html.getBytes("ISO-8859-1"));
                        tidy.parse( is, os ); 
                        String fieldValue = os.toString();
 
                        StringReader sr = new StringReader(  fieldValue );
 
                        ArrayList array = new ArrayList();
                        MyElementHandler ehandler = new 
MyElementHandler(array);
                        XMLWorkerHelper wx = XMLWorkerHelper.getInstance
(); 
                        wx.parseXHtml(ehandler,sr);
                        array = ehandler.getArrayList(); 
 
                        // the body field
                        java.util.List<FieldPosition> posArr = 
af.getFieldPositions( "Text" );
 
 
                        FieldPosition bodyPosition  = posArr.get(0); 
                        PdfContentByte cb = stp.getOverContent((int
)bodyPosition.page); 
                        ColumnText ct = new ColumnText( cb ); 

                        // X1 top, y1 top, x2, y2
                        //0=page, 1=llx, 2=lly, 3=urx, 4=ury
                        ct.setSimpleColumn(bodyPosition.position
.getLeft()-(0), bodyPosition.position.getTop(), bodyPosition.position
.getRight(), bodyPosition.position.getBottom());
                        float curLead = ct.getLeading(); 

                        ct.setLeading(curLead-0.5f);       // Kan ev. 
?ndras till mindre v?rde f?r mindre p?verkan p? radspacing (minska mer och 

raderna flyter ihop mera...)

                        Element el = null;

                        int currPageNbr[] = new int[1];
                        currPageNbr[0] = (int)bodyPosition.page; 
                        String text = "";
 
 
                        int myArraySize = array.size();
                        // loopa igenom bodytexten i sin helhet
                        for (int idx = 0; idx < myArraySize; idx++)
                        {
                                el = (Element)array.get(idx);
 
 
                                List<Chunk> chunks = el.getChunks();
                                if (chunks.size() > 0) {
                                        Chunk chunk = 
(Chunk)chunks.get(0); // get the others if needed
                                        text = chunk.getContent().trim();
                                } else
                                        text = "";   // Detta inneh?ller 
ingenting. ignorera
 
 
                                ct.addElement(el); 
                                int res = ct.go(); 
                        }
 
                        ct.go(); 
 
                        stp.close();
 
                        OutputStream outputStream = new FileOutputStream (
"c:/temp/outputGen1.pdf"); 
                        baos.writeTo(outputStream);
                        baos.flush();
                        baos.close();
 
 
 
                } catch (Exception e) {
                        e.printStackTrace();
                }
        }
 
 
        public void gen2()  {
 
                try {
                FontFactory.registerDirectories();
                Document document = new Document();

                ByteArrayOutputStream baos = new ByteArrayOutputStream();
 
                String html = readFile("c:/temp/markup.html");
 
                PdfWriter writer = PdfWriter.getInstance(document, baos);
 
                document.open();

                HtmlPipelineContext htmlContext = new HtmlPipelineContext(
null);

                htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory
());

                                CSSResolver cssResolver =

                    XMLWorkerHelper.getInstance().getDefaultCssResolver(
true);

                Pipeline<?> pipeline =

                    new CssResolverPipeline(cssResolver,

                            new HtmlPipeline(htmlContext,

                                new PdfWriterPipeline(document, writer)));

                XMLWorker worker = new XMLWorker(pipeline, true);

                XMLParser p = new XMLParser(worker);
 
                // Use JTidy to force html to xhtml
                Tidy tidy = new Tidy();
                tidy.setMakeClean(true);
                tidy.setXHTML(true);
                tidy.setBreakBeforeBR(false);
                tidy.setShowWarnings(false);
 
                ByteArrayOutputStream os = new ByteArrayOutputStream();
                InputStream is = new ByteArrayInputStream(html.getBytes(
"ISO-8859-1"));
                tidy.parse( is, os ); 
                String fieldValue = os.toString();
 
                p.parse( new StringReader(fieldValue) );

                document.close();
 
                OutputStream outputStream = new FileOutputStream (
"c:/temp/outputGen2.pdf"); 
                baos.writeTo(outputStream);
                baos.flush();
                baos.close();
 
                } catch (Exception e) {
                        e.printStackTrace();
                }
 
        }
 
        public String readFile(String path) throws Exception  {
                 BufferedReader br = new BufferedReader(new 
FileReader(path));
                 String everything = "";
                    try {
                        StringBuilder sb = new StringBuilder();
                        String line = br.readLine();

                        while (line != null) {
                            sb.append(line);
                            sb.append('\n');
                            line = br.readLine();
                        }
                        everything = sb.toString();
                    } finally {
                        br.close();
                    }
 
                    return everything;
        }

}


here is the accompanying class MyElementHandler:


package se.folksam.test;

import java.util.ArrayList;
import java.util.List;
import com.itextpdf.text.Element;
import com.itextpdf.tool.xml.ElementHandler;
import com.itextpdf.tool.xml.Writable;
import com.itextpdf.tool.xml.pipeline.WritableElement;

public class MyElementHandler implements ElementHandler {

        ArrayList array = null; 
 
        public MyElementHandler(ArrayList arr) {
                this.array = arr;
        } 
 
        public ArrayList<Element> getArrayList() {
                return this.array;
        }
 
        public void add(final Writable w) {
                if (w instanceof WritableElement) { 
                        List<Element> elements = 
((WritableElement)w).elements();
                        // collect in array
                        for (Element e : elements) { 
                                array.add(e);
                        } 
                }
        }
}



and the actual HTML-file that i use:




<br><strong>sdfsdf<br>s<br>df<br></strong> <ul> <li>sdf <li>s <li>dfs 
<li></li> </ul> <ol> <li></li> </ol> <span style="FONT-SIZE: 24px"><span 
style="COLOR: #737373">df<br>sd<br>f<br></span></span>s<br><br>V?nliga 
h?lsningar<br>Department XXX<br>Joe 
Doe<br>some.em...@company.xxx<br>Phone: 555 - 123456





And the PDF-file






(yes, its pure nonsens, but it shows the problem quite well).

My question is really. Why does it differ when one uses a blank document 
vs. when you use a (large) field as boundary, doing the exact same thing.




Kind regards

Daniel

Daniel Lehtihet
IT-arkitekt
Arkitektur

Folksam
106 60 Stockholm
Bes?k: Bohusgatan 14
Telefon: 08-7726041
Mobil: 0708-31 51 71
daniel.lehti...@folksam.se
http://www.folksam.se
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Huge_Text_Field.pdf
Type: application/octet-stream
Size: 6176 bytes
Desc: not available

------------------------------

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft 
technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk


------------------------------

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA

End of iText-questions Digest, Vol 88, Issue 6
**********************************************

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] XMLWorker - difference between creating blank pdf and filling out an existing (with a field). (Daniel Lehtihet)

Reply via email to