from:"Mike Marchywka"

Re: [iText-questions] NPE while Extracting text

2010-06-21 Thread Mike Marchywka

Date: Mon, 21 Jun 2010 09:49:44 +0100
From: b...@benshort.co.uk
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] NPE while Extracting text

Thanks very much for this information.

Maybe you could offer me some direction of how to solve my problem?

I need to parse pdf mobile phone bills. the information i require is
the itemized data that is in a table format. Is this possible with
itextpdf?

I know this won't help you but let's be clear- pdf is NOT the format
of choice for DATA or INFORMATION. It is generally about
human readability- and while this often has a describable structure,
everyone here tells me it is too complicated to include that in the
PDF file. If you have a choice, and have a cooperative relationship
with the source of the documents, you want an INFORMATION
format, not a bunch of pixels. Scraping html or pdf is
often done by people trying to extract information from artwork
but you always need to make assumptions about the document
structure. If you want a robust means to do this,
at least workout some conventions with the document authors.

The great leap in information representation in going from
pictures to an alphabet is that fonts don't matter. You
probably want to extract the text and scrap the font
stuff. If text can not be extracted easily from the PDF itself,
you need to reduce it to pixels and then extract with
OCR software. Or, get the document author to only include
the important stuff to begin with.

On 19 June 2010 08:44, 1T3XT info wrote:
Ben Short wrote:
subType is /Type3

Does this help identify the problem?

Yes, but it doesn't bring us closer to a solution.

Type 3 fonts are user defined fonts.

See for instance:
http://itextpdf.com/examples/index.php?page=exampleid=200
In that example, a 'delta' and 'sigma' shaped glyph was defined,
corresponding with the characters 'D' and 'S'. However, the example
would also have worked if we'd used any other character.

Another example: we could define a glyph that looks like the symbol for
'The Artist Formerly Known As Prince' to correspond with the character
'P'. That's what Type 3 fonts are about: they can be used when a user
needs a glyph that isn't provided in any other font.
Therefore it's very hard to extract that content: how are you going to
know that the glyph corresponding with 'P' needs to be 'translated' to
'The Artist Formerly Known As Prince'? I don't think there's a UNICODE
code point for that glyph.

I think you've hit a limitation regarding text extraction in general.
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords

Re: [iText-questions] iText Optimization

2010-06-12 Thread Mike Marchywka

Date: Fri, 11 Jun 2010 22:55:53 -0700
From: thanga...@gmail.com
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] iText Optimization

Hi,

I have pasted a screen shot of CPU profiling for JasperReports. The report
takes about 9 seconds to generate, 0.5 seconds of that is spent instantiating
PdfGraphics2D.

http://i.imgur.com/arzCI.png

First, any optimization relies on getting the right data structs and
representations. This of course
is the first half of data structures and algorithms thinking that precedes
local coding and
implementation optimizations. You pointed us to a 61k image when a few lines of
ASCII text
would have been more readable ( I'm still fiddling with eog zoom LOL) and more
portable
and versatile for automated analysis other places ( lets say I wanted to import
these results
into a bash script and use as a benchmark against alt codes).

This approach is my biggest concern with people in the PDF community, focusing
on pictures rather than information. There is nothing wrong with human
readability
but just because you have a bloated picture in a standard format doesn't mean
you
have added any utility to your output. In this case a nice ASCII table would
better serve
the purpose of information sharing. Something to think about when you make your
next work of art that obscures information.

Briefly, I noticed a few things:

1. No lazy initialization is being used.

Often with long complicated things, you can do some thinking up front and pick
a strategy
before doing anything. This would involve looking at whatever input parameters
are cheap and easy to examine, estimating some parameters and then initializing
everything up front with optimal memory maps etc. I sympathize with your
concern,
as I have seen code that takes longer to startup then execute, but in this
case if you expect to do something that takes a while you can do some order zero
thinking that will payoff in the inner loops.

2. Two instances of AffineTransform() are created (IDENTITY constant and one
in the constructor).
3. Redundant instance variable assignments to false.

I find myself doing this al the time, knowing it is a waste. Not sure why. LOL.

[...]

If you change the code around to use an instance variable (by uncommenting
the commented lines and making the appropriate change in the inner loop),
you'll see at least an order of magnitude increase in speed. (This is because
the JVM uses a longer bytecode to address a class variable than it does a
local scope variable; I don't think the JIT will optimize it.)

There could be several factors here, but probably the issue is memory
coherence. IIRC, results that are not
observable don't have to be published by being written to main memory, If you
want another thread to
see what you are doing, then you need to use the member and call it volatile.
Otherwise, explicit
use of a local is more likely to stop the JRE from doing less predictable
memory accesses- it is
already using the stack a lot. Even if this is all hotspot compiled, the local
is likely to create
stack relative code. You could argue that well the members should be in a low
level cache somewhere
since well written code is likely to have most variable references to this
but that may not help with
larger objects etc.

The doAttributes() method, for example, will suffer a little because of this.

Dave

_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] MalformedByteSequenceException

2010-06-10 Thread Mike Marchywka



on quick look I'm not sure what you expect this to have to do with
itext. That is, it isn't obvious the XML came from a pdf file or was
created by itesxt etc. Did you look at the bytes going into the
parser? In any case it would probably be obvious if you dumped
these bytes and had some idea what the method expected as valid bytes.  
Is DocumentBuilder something to do with pdf? 

 doc = builder.parse(new
 StringBufferInputStream(sourceURL));


Its early I mady have this wrong but maybe more direct relationship
it itext would help.




 To: itext-questions@lists.sourceforge.net
 From: kishore.chitt...@tcs.com
 Date: Thu, 10 Jun 2010 11:16:08 +
 Subject: [iText-questions] MalformedByteSequenceException

 Hi,
 I am getting malformed exception when i am generating PDF. Please refer the
 sample code and exception stack trace below. Can you please let me know what
 needs to be done.


 Exception
 

 com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
 Invalid byte 3 of 3-byte UTF-8 sequence.
 at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte
 (UTF8Reader.java:674)
 at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read
 (UTF8Reader.java:425)
 at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load
 (XMLEntityScanner.java:1742)
 at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent
 (XMLEntityScanner.java:916)
 at
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentC
 ontentDriver.next(XMLDocumentFragmentScannerImpl.java:2773)
 at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next
 (XMLDocumentScannerImpl.java:647)
 at
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocum
 ent(XMLDocumentFragmentScannerImpl.java:508)
 at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
 (XML11Configuration.java:807)
 at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
 (XML11Configuration.java:737)
 at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse
 (XMLParser.java:107)
 at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse
 (DOMParser.java:225)
 at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse
 (DocumentBuilderImpl.java:283)
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
 at ExportAlltabsToPdfAction.main(ExportAlltabsToPdfAction.java:61)

 ---

 Sample Code:
 

 import java.io.DataInputStream;
 import java.io.FileInputStream;
 import java.io.IOException;
 import java.io.PrintWriter;
 import java.util.*;
 import java.net.*;
 import java.io.*;

 import java.io.FileOutputStream;
 import java.io.IOException;
 import java.io.OutputStream;
 import java.io.StringBufferInputStream;

 import javax.xml.parsers.DocumentBuilder;
 import javax.xml.parsers.DocumentBuilderFactory;

 import org.w3c.dom.Document;
 import org.xhtmlrenderer.pdf.ITextRenderer;
 import org.xhtmlrenderer.resource.FSEntityResolver;




 //import com.lowagie.text.DocumentException;


 public class ExportAlltabsToPdfAction
 {


 public static void main(String args[])throws Exception
 {

 try
 {

 StringBuffer inputFile = new
 StringBuffer();
 URL yahoo = new URL
 (http://dfte.ual.com/wiki/index.php/Main_Page;);
 BufferedReader in = new
 BufferedReader(
 new
 InputStreamReader(
 yahoo.openStream
 ()));

 String inputLine;

 while ((inputLine = in.readLine
 ()) != null){
 inputFile.append
 (inputLine);
 }

 in.close();
 String outputFile = ;
 outputFile = C:\\AllTabs.pdf;
 OutputStream os = new
 FileOutputStream(outputFile);
 DocumentBuilder builder;

 builder =
 DocumentBuilderFactory.newInstance().newDocumentBuilder();
 FSEntityResolver er=
 FSEntityResolver.instance();
 builder.setEntityResolver(er);
 Document doc;
 String sourceURL =
 inputFile.toString();
 System.out.println(sourceURL);
 doc = builder.parse(new
 StringBufferInputStream(sourceURL));
 ITextRenderer renderer = new
 ITextRenderer();
 renderer.setDocument(doc, null);
 renderer.layout();
 renderer.createPDF(os);

 os.close();


 } catch (Exception e) {
 e.printStackTrace();
 }

 }



 }



 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
 lucky parental unit. See the prize list and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list:

Re: [iText-questions] MalformedByteSequenceException

2010-06-10 Thread Mike Marchywka










 To: itext-questions@lists.sourceforge.net
 From: kishore.chitt...@tcs.com
 Date: Thu, 10 Jun 2010 17:30:36 +0530
 Subject: Re: [iText-questions] MalformedByteSequenceException



 Hi,

 Can you please guide me how to resolve
 this issue. Even though it is not related with itext.


The fact that it isn't related to itext doesn't stop people from responding, 
however
you are likely to get responses like hire a programmer at this point :)
Know one here knows what your input data looks like, you need to validate
it in any case for a real app - never assume another server returns good
stuff.






















 Thanks  Regards

 Kishore CH







 

 Experience certainty. IT Services


 Business Solutions


  
_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2

2010-06-04 Thread Mike Marchywka

Date: Fri, 4 Jun 2010 12:51:07 +0200
From: klas.lindb...@val.se
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2

One obvious thing to look at is physical memory and paging. I have a
hunch

that WLS is more memory-consuming than Tomcat leaving less for iText

which may cause paging to occur.

but memory is cheap and my disk is very very very very fast. LOL.
This is a problem with almost all apps today, my current frustration
is browsers ( not just PDF files any more LOL). I'm not even sure if
there are good diagnostic here other than looking at page faults on
task manager. I've learned to start swearing once my disk light comes on
and I'm not doing overt file IO. LOL

Also, profiling was suggested, and I agree that it is a very good idea
to help

pinpoint the source of the problem.

/Klas

Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2 MP3

2010-06-03 Thread Mike Marchywka





( extra space courtesy of hotmail tired of editing it out) 






 Date: Thu, 3 Jun 2010 08:55:14 -0700
 From: msto...@autonomy.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2 MP3
























 That’s the first I’ve heard of
 it. Can you profile it to see what’s taking so long?


Also if you want to do comparative tests perhaps some 
analysis of your control group would help. 









 --Mark Storer



 Senior Software Engineer



 Cardiff.com







 import legalese.Disclaimer;



 Disclaimer DisCard = null;














 





 From: George Li
 [mailto:g...@varicent.com]

 Sent: Wednesday, June 02, 2010
 3:21 PM

 To:
 itext-questions@lists.sourceforge.net

 Subject: [iText-questions] iText
 Perfomance Issue on WebLogic 9.2 MP3









 Hi,







 I
 have a PDF exporting service hosted on WebLogic 9.2 MP3. I find that the
 Document.add(Element) call is 2-3 times slower than when the PDF service
 is hosted on Tomcat (on the same machine), especially if the element is huge
 one such as a table of 2000 rows. Is there any fix for this problem?


















 No virus found in this incoming message.

 Checked by AVG - www.avg.com

 Version: 9.0.819 / Virus Database: 271.1.1/2910 - Release Date: 06/02/10 
 02:57:00



  
_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi

2010-05-31 Thread Mike Marchywka

From: psoa...@glintt.com
To: itext-questions@lists.sourceforge.net
Date: Mon, 31 May 2010 11:21:30 +0100
Subject: Re: [iText-questions] getDpiX, Y returns 0 or wrong value but
WinExplorer and Acrobate get Dpi

Why do you think iText is wrong? Post the images for inspection.

Paulo

-Original Message-
From: dermoritz [mailto:tantea...@hotmail.com]
Sent: Monday, May 31, 2010 11:17 AM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] getDpiX, Y returns 0 or wrong value but
WinExplorer and Acrobate get Dpi

i have some Problems with image.getDpiX/Y (iText 5.0.2) on some Images. This
Images come from different digital camaras and WinExplorer shows for all of
the 240dpi. I rotated one of them 90° via the WindowsXP build in image
viewer. For this rotated image getDpi returns 96dpi! But still WinExplorer
an Acrobat and MS-Office Picture Manager show 240dpi for all Images.

So why iText cant't get correct dpi from them?

You could probably get something like imagemagick and use identify to
dump the metadata for the images that work and those that don't. I don't
know formats well enough to know what is posible but looking at stuff I have
from various phnes/cameras, it seems there is a resolution entry but also
something
called Exif thta contains more res info. I guess it is possible that some
images only
have it in a non standard but well known location.

Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter
informação confidencial ou legalmente protegida. A incorrecta transmissão
desta mensagem não significa a perca de confidencialidade. Se esta mensagem
for recebida por engano, por favor envie-a de volta para o remetente e
apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o
destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain
confidential or legally protected information. The incorrect transmission of
this message does not mean the loss of its confidentiality. If this message
is received by mistake, please send it back to the sender and delete it from
your system immediately. It is forbidden to any person who is not the
intended receiver to use, distribute or copy any part of this message.

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi

2010-05-31 Thread Mike Marchywka

Date: Mon, 31 May 2010 04:44:47 -0700
From: tantea...@hotmail.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] getDpiX, Y returns 0 or wrong value but
WinExplorer and Acrobate get Dpi

thx, but to look for myself for correct meta-data is no option. in this

LOL

case i have to decide what is the correct resolution or what is the most
credible place to look for resolution.
the only thing i know is that iText don't shows the correct resolution
(but probably looking at correct place for it) and many other programs
showing the correct resolution (probably looking at well known places).

can anyone tell me where iText looks for dpi? and does anybody know where
all those other programs look for it? - i think all programs only look in
one place?!

Generally you want to get the most direct or definitive result
you can. Getting bunch of IIRC usually just leads to more problems.
It may be a simple matter to just look with a command line tool rather
than getting human input that may or may not help.

So ok you find out that itext looks in place A and product
B looks in place C then what? Usually it is just easier if you want
to say itext doesn't work to have some specific case under
which it appears to fail. if you could run identify on the bad images
that may help and the answer may even be evident ( itext forgot
to divide by blah). IF the people you are asking don't know
the answer they will need to do all this anyway.

--
View this message in context:
http://itext-general.2136553.n4.nabble.com/getDpiX-Y-returns-0-or-wrong-value-but-WinExplorer-and-Acrobate-get-Dpi-tp2237115p2237210.html
Sent from the iText - General mailing list archive at Nabble.com.

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi

2010-05-31 Thread Mike Marchywka











 Date: Mon, 31 May 2010 04:53:45 -0700
 From: tantea...@hotmail.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] getDpiX, Y returns 0 or wrong value but 
 WinExplorer and Acrobate get Dpi


 edit: i just checked some of the images here:
 http://www.fileformat.info/convert/image/identify.htm (uses imageMagick)

 for all images Resolution: 240x240 is shown!

I never hit info links but you may need to run with -verbose to get
all the metadata detailed. This is resolution ( as in DPI) and not pixel 
dimension?


 --
 View this message in context: 
 http://itext-general.2136553.n4.nabble.com/getDpiX-Y-returns-0-or-wrong-value-but-WinExplorer-and-Acrobate-get-Dpi-tp2237115p2237219.html
 Sent from the iText - General mailing list archive at Nabble.com.

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Spam: Unit testing flattened PDFs

2010-05-28 Thread Mike Marchywka

[ after our latest SMTP exchange, notice what hhotmail does
with just splain text LOL... I'm not sure anyone even tests this
stuff ]

Date: Fri, 28 May 2010 09:03:04 -0700
From: msto...@autonomy.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Spam: Unit testing flattened PDFs

Unit testing PDF is Notoriously Difficult.

For just plain pixel compares,
I've suggested this before but if you are really stuck and have
resources, consider something like instrumented video compression libraries.
That is, the compression relies on isolating things of perceptual
interest, like motion vectors for example. Now, ideally if you could
get a result that says this block is moved over between the two frames
that might be the metric you want.

Ideally, you’d save the coordinates
of your various fields and run OCR on your resulting flattened PDF, looking
for
the correct text in the correct place.

Well, presumably you have the fonts that you could render ( ex ligatures etc)
and you could just look for pixel blocks that match, this is a lot easier than
general OCR with unknown fonts or sizes ( if you can't estimate these a priori
you are stuck LOL).

Most people who make stuff up have a model described SOMEWHERE even if
they have to absolutely positively remove every trace of it before publishing
their standard professional result. It isn't entirely cheating to use this
for testing but you can appreciate how useful it is to those of us who
get stuck using your pixel creation too.

Realistically? Umm… ouch.
Actually, the pdf.parser.PdfTextExtractor could be Quite Helpful. Yeah…
! Check out SimpleTextExtractingPdfContentStreamProcessor. With a name like
that, it must be easy, right?

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Blank PDF after it is transfered through SMTP

2010-05-27 Thread Mike Marchywka

Date: Wed, 26 May 2010 23:53:55 -0400
From: jgs...@gmail.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP

Michael,

Your points are well-taken.

Michael wrote:
This means that your quoted-printable encoder does not do a thorough
job,

either because it is buggy or because you have not told it that the data
to

encode is not text where a single carriage return, a single line feed,
and a

carriage return line feed combination all mean the same.

This is what I am suspecting as well. Strangely another pdf file (generated
by JClass from Quest Software) does not have inflated bytes issue, although
it went through exactly the same Java mail code I posted. I will dig further
on it.

Did anyone see my earlier link? Unlikely doesn't mean MUST
and I can assure the PDF is not human readable without a special decoder ring or
PDF viewer LOL,

http://tools.ietf.org/rfc/rfc2045.txt

6.7. Quoted-Printable Content-Transfer-Encoding

The Quoted-Printable encoding is intended to represent data that
largely consists of octets that correspond to printable characters in
the US-ASCII character set. It encodes the data in such a way that
the resulting octets are unlikely to be modified by mail transport.
If the data being encoded are mostly US-ASCII text, the encoded form
of the data remains largely recognizable by humans.
If you keep reading,

Note that many implementations may elect to encode the
local representation of various content types directly
rather than converting to canonical form first,
encoding, and then converting back to local
representation. In particular, this may apply to plain
text material on systems that use newline conventions
other than a CRLF terminator sequence. Such an
implementation optimization is permissible, but only
when the combined canonicalization-encoding step is
equivalent to performing the three steps separately.

Regards,

Jiangang Song

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Blank PDF after it is transfered through SMTP

2010-05-27 Thread Mike Marchywka


or if yo uread even more, clearly this is for human readable content
not binary data. Why are you trying this? It's unlikely that the 
actual impl would even care about preserving data against some
transformations that could occur by mail anyway, 


 Because quoted-printable data is generally assumed to be line-
   oriented, it is to be expected that the representation of the breaks
   between the lines of quoted-printable data may be altered in
   transport, in the same manner that plain text mail has always been
   altered in Internet mail when passing between systems with differing
   newline conventions.  If such alterations are likely to constitute a



Freed  Borenstein  Standards Track[Page 21]

RFC 2045Internet Message BodiesNovember 1996


   corruption of the data, it is probably more sensible to use the
   base64 encoding rather than the quoted-printable encoding.


  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Blank PDF after it is transfered through SMTP

2010-05-27 Thread Mike Marchywka

Date: Thu, 27 May 2010 09:16:38 -0400
From: jgs...@gmail.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP

Michael,

I attached a pdf (generated by non-iText software) which was received through
email after SMTP transfer using quoted-printable.

I still don't get why you insist on using this approach given what the IETF
says about it
or what this has to do with itext.
The general intent is to encode human readable information (ASCII) such that
it is not modified in a way likely to matter to an intelligent human.The
encoding
format is designed to make the encoded file human readable, presumably
reflecting
human readable target data.
Are you just suggesting that itext should support dos and linux line endings?
Is a viewer expressing a preference for one or the other?

Do you have a pdf file recieved after going through a profanity and patriotism
scanner too?

I'm genuinely curious now, inquring minds want to know.

I opened it using Textpad by binary mode. It constistently uses 0A as eol and
contains no 0D. I understand that PDF 1.4 spec does not require such a
consistency for eol. However, it could be the reason that Java mail transfer
encoder messed up. I will dig more.

I realized that I mentioned a commercial pdf software name. It is not
intentional. I sincerely appologize if it bothers anyone.

Acurate and relevant information can't bother anyone - do you want us to guess?
LOL.

Regards,

Jiangang Song

On Wed, May 26, 2010 at 11:53 PM, Jiangang Song wrote:

Michael,

Your points are well-taken.

Michael wrote:

This means that your quoted-printable encoder does not do a thorough job,
either because it is buggy or because you have not told it that the data to
encode is not text where a single carriage return, a single line feed, and a

carriage return line feed combination all mean the same.

Regards,

Jiangang Song

_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Blank PDF after it is transfered through SMTP

2010-05-27 Thread Mike Marchywka

Then you’ve been REALLY LUCKY – since QP and PDF have never gotten
along.

I guess my interest here was in determining how primitive a valid PDF would
have to be if it was assured of being ASCII, if that is even possible.
If you could write out PDF's with such a constraint they may work better
with some other tools but obviously you'd expect to drop many things
and make files even bigger ( which is often fine for some intermediate
things like object files) .

From: Jiangang Song

I appreciate your response. And I am well-known of the RFC
spec before I post. We have been using Quoted-printable to transfer
PDF for the past 10 years.

I thought my question had just missed something obvious but
I guess if you'd even looked at it you would done a binary diff first, found
the missing high bits and CRLF issues and made histograms of historical PDF
files and found no bad cahrs and recognized the problem and then
just asked if itext can generate pure ASCII pdf files. Not an unreasaonble
question, see comments above.This is a reason however not to
always use the latest and greatest features, sometimes they just don't
work with the existing stuff.

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Question about converting HTML to PDF

2010-05-26 Thread Mike Marchywka












 Date: Wed, 26 May 2010 18:30:42 +0300
 From: dhryvas...@serena.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Question about converting HTML to PDF

 Hi all -

 I am using iText and I try to convert HTML to PDF using this library. It 
 works well for me with simple HTML.
 The question that I have is: does iText support converting HTML which 
 includes css references and JavaScript to PDF? If I have JavaScript embedded 
 in my HTML will it work in the generated PDF? Is it a way to do this?

There is a webkit app that does this, 

http://code.google.com/p/wkhtmltopdf/


I guess I would ask a related question that seems to be answered by
the above app,  has anyone considered
using itext or other tools with browsers such as webkit? Presumably
webkit, as an example, knows how to render ( by definition almost it
is right as usually people want to copy what they see on [ some ] browser
even if there is not standard that captures quirks and bugs LOL ).
Apparently webkit generates things like DOM's and structures for drawing,
you could consider several ways to interface or mix and match tools.

For example, some JNI interface between your java app and a modified
webkit built OR an intermediate language such that this would do something
useful:

webkit_tool -dump_render_tree http://xxx.com | java -jar my_itext_app 
pdf_of_web.pdf


I will admit right now that much like grep the source code was my prior
answer for everything, webkit ( an opensource browser) seems like the answer
to everything that involves a browser. 


 Thanks in advance,
 Denys
  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Image Speed

2010-05-25 Thread Mike Marchywka

Although I looked through a few different threads, I couldn't find anything
that answered my exact problem (apologies if I missed something).

I am creating a PDF document that needs to support many images - upwards of
20 unique images. Generating the PDF takes ~1 second per image (my testing
determined images are the bottleneck), which is a problem as I need to
generate these PDFs dynamically, and this is just too long to wait :( ...

Here is the code I am using for each image:

image = Image.getInstance(url);
image.setAbsolutePosition(left, top - depth);
image.scaleAbsolute(width, depth);
d.add(image);

About half of the time per image is consumed by image =
Image.getInstance(url); - unfortunately, I don't think there's an
alternative here, however if anyone has a faster way of doing this, that'd
be appreciated.

Does this point to some other machine? Or a local file on disk somewhere?
You can of course cache these or maybe compress but in any case this is not a
PDF problem if
limited by IO.

However, the other time consumption is due to actually adding the images to
the document - my question is if there's a way to speed up this step.

Of course, if this is the best performance I can expect from iText, that
would be great to know too, so that I can start looking into other PDF
libraries.

Presumably you'd like to get some indication that the
other libraries could be faster by determing that there is
a better code or algorithm alternative to that used by itext
( native code may be faster too). It could just be that the task
takes a lot of work. Often, however, you find
things like memory usage rather than instruction count
become the limiting issues- if you have lots of big images, VM will
still thrash them around unless you happen to get lucky.
In this case you could probably get tremenduous speed ups
by only keeping what you need in memory and writing completed stuff
out as it is ready etc.
The answer of course is that you should dig down deeper and
see what the issue is more precisely. An itext person with
knowledge of the code may have some suspects but you
also could have some quirk specific to your system
that causes bigger problems.

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Blank PDF after it is transfered through SMTP

2010-05-25 Thread Mike Marchywka

From: js...@hotmail.com
To: itext-questions@lists.sourceforge.net
Date: Tue, 25 May 2010 18:37:45 -0400
Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP

Thank you for pointing out shave bytes.

In fact, this time it is inflated bytes. Comparing the pdf file generated
directly and the one transferred through SMTP using
content-transfer-encoding: quoted-printable, all 0A is inflated to 0D 0A
and all 0D is also inflated to 0D 0A. There is no other difference. Just this
minor inflation blows up acrobat reader and it shows up as blank pdf. (There
is no such inflation if base64 is used as content-transfer-encoding.)

What exactly are these characters? Why might this make sense with some data
types?

So the pdf generated by iText contains either 0A or 0D but not 0D 0A
together. Is this by design? Or is it configurable?

I guess if it did this consistently, you could use dos2unix or sed to fix the
file.

P.S.: all test is on Windows platform. Attached page_numbers.pdf is generated
directly and test.pdf is received through email as described above using
quoted-printable encoding.

See this for example, learn to use ietf for these types of issues or other
standard
groups,
http://tools.ietf.org/rfc/rfc2045.txt

6.6. Canonical Encoding Model

There was some confusion, in the previous versions of this RFC,
regarding the model for when email data was to be converted to
canonical form and encoded, and in particular how this process would
affect the treatment of CRLFs, given that the representation of
newlines varies greatly from system to system, and the relationship
between content-transfer-encodings and character sets. A canonical
model for encoding is presented in RFC 2049 for this reason.

6.7. Quoted-Printable Content-Transfer-Encoding

Date: Mon, 24 May 2010 18:00:38 +0200
From: i...@1t3xt.info
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP

Jiangang Song wrote:
Or is there anything wrong with my usage of Java mail?

The blank page problem is (as documented in the book) caused by the
fact that some applications (such as Java mail?) shave bytes.

PDF is a binary file format. You need to transfer it as a binary file.
If you open up the PDF with the shaved bytes in a text editor, you'll
see that there are lots of question marks. Those are bytes that have
lost a bit due to the way you've transferred them.

You need to make sure that you transfer the file as a binary file.
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

The New Busy is not the old busy. Search, chat and e-mail from your inbox.
Get started.

_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Image Speed

2010-05-25 Thread Mike Marchywka

 Date: Tue, 25 May 2010 16:39:42 -0700
 From: jdebr...@gmail.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Image Speed

 Does this point to some other machine? Or a local file on disk somewhere? 

 This URL points to a photo hosted by our photo server - this is the only way
 to access these images unfortunately. However, do you think it would help to
 compress the image somehow before calling image.getInstance()? Caching does
 no good as there will very rarely be repetition in the images used :( ...

unlikely to be due to transfer time, could even be disk IO on server but
only a non-itext related suspect.

_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Blank PDF after it is transfered through SMTP

2010-05-24 Thread Mike Marchywka

From: js...@hotmail.com
To: itext-questions@lists.sourceforge.net
Date: Mon, 24 May 2010 11:54:29 -0400
Subject: [iText-questions] Blank PDF after it is transfered through SMTP

I tried to send generated PDF through SMTP using Java mail api. It puzzled me
that the content of PDF once received in email is blank unless the
Content-Transfer-Encoding is set to base64. For example,

Do you have any idea what base64 encoding would do? That
may be a good place to start. You could for example extract the
text of your pdf and probably send that without complication
( shameless taunt for response LOL ). What do you mean by blank?
You opened it in viewer and got blank page or zero length file?

Does iText support other Content-Transfer-Encoding like quoted-printable? Or
is there anything wrong with my usage of Java mail?

Only by accident would anyone here know anything about mail or SMTP.
iText supports PDF. you should look at the PDF spec and the meaning of
transfer encoding.

Also, the way to determine the answer is binary diff the two PDF files, before
and after going through the mail. You will probably be able to get some
idea of what happened. Of course, if you really got zero length file may
not be too informative. I should probably know what happened., but
well I try to only send ASCII in mail.

Regards,

Jiangang Song

Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox. See how.

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] text searching + opening a document directly to the search result location

2010-05-20 Thread Mike Marchywka

I'm just trying to get some clarity on what each of these features
is, both what itext and pdf viwers support and what you are trying to do.

Date: Thu, 20 May 2010 03:35:06 -0700
From: victor_ba...@yahoo.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] text searching + opening a document directly
to the search result location

Ok, we've settled the part with the document opening (PdfDestination -
LocalDestination - SetOpenAction).

You are asking below about search, you mean you want the action on oppen to be
a search
or you want it to vary depending on a parameter passed when the viewer was
invoked? Do you
actually want a table of contents or to do a search?

Now what about the search? I need to find a certain text and then get the
coordinates for it so that I cat set the PdfDestination.

Is this your own app or a web app running in a browser that opens up whatever
pdf viewr or plugin the user may have?
Is your question how do I use itext to search a pdf document? This has come
up before in the
context of how do I extract text along with location on screen and often the
response is it is too complicated
since it involves a transofmration matrix but I did manage to get a simple
utility to output
lines of form x y text for all text in a document. If youy want the viewer to
do the search, isn't this
a viewer question ( how do I open a viewer to scroll to the first occurence
of foo?).

I've adopted this approach because the PDF help-file is provided by the
customer. He'll be reluctant too make a separate help file for each property.
And the help itself is to complex to simply transcribe it myself to HTML.

Well, typing apropos pdf I did find on my debian install there is something
called pdf2html, not sure how well
it works but it may be an option. It depends on what you mean by complex-
intricate artwork of somekind
of copmlex interaction logic? It isn't hard to find lots of big customers who
say gee pdf is a standard and looks good
and the files are huge so there must be loads of information in there, you'd be
crazy not to use this.

I'm in the research stage right now, so time is precious. If iText
doesn't support these features please tell me so that I may look for another
solution elsewhere.

--
Message: 1
Date: Wed, 19 May 2010 09:00:44 -0500
From: Cameron Laird
Subject: Re: [iText-questions] text searching + opening a document
directly to the search result location
To: Post all your questions about iText here

Message-ID:

Content-Type: text/plain; charset=iso-8859-1

On Tue, May 18, 2010 at 6:50 AM, Mike Marchywka wrote:

Date: Tue, 18 May 2010 04:30:48 -0700
From: victor_ba...@yahoo.com
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] text searching + opening a document directly
to the search result location

Hello!

In my app. I have a table. On each row there is a help button. The help
is provided in the form of a large PDF file.

If the user presses the help button on a row, the PDF should open
directly where the explanation for that row properties is.

Can I do this with iText (actually iTextSharp)? Can I search the document
(using the property name on that row) and then open the document to the
user, directly at that location?

This is a bit like the prior question about , how do I use a servlet to
deal with big pdf files. The first answer
might be,
why are you using PDF in this setting? Rather than having a
help button provoke a search,
wouldn't you be better off doing the search previously or can help point to
arbitrary locations?
In the former case, html with fragments may work or just having separate
pages, in the latter case a DB
may be more appropriate although I guess you could ask about PDF indexing
or TOC capabilities.
Leonard, you care to explain how PDF is a good choice here? Thanks.
My interest of course is that I end up having to use some of these
creations that people design...

Victor, it occurs to me that
http://www.itworld.com/development/107909/tools-pdf-internal-links might
bear on your target.
-- next part
--
An HTML attachment was scrubbed...

_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book

Re: [iText-questions] text searching + opening a document directly to the search result location

2010-05-20 Thread Mike Marchywka

Date: Thu, 20 May 2010 05:27:32 -0700
From: victor_ba...@yahoo.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] text searching + opening a document directly
to the search result location

Hello again!

Ok, I'll try to make it a bit clearer. Imagine this table:

Is this in a browser or your own app?

It the user presses the btnHelp, the Pdf help-file should appear on the
screen.
This document has the explanations for all the properties. For example:

1. Property 1
blahblah and more blah.

2. Property 2
blh (tables, diagrams...)

are the properties and help responses relatively static and known apriori
or quite dyanmic?

Because the help is amassed in one file, the client would want this behavior:
-user clicks on property n btnHelp
-pdf document opens scrolled directly to n. Property n. - much like the
bookmark behavior

Well, everyone gets requests like this you may want to get a better idea of
what the actual end product should be from the user's view point. If the concern
is general quality of the help text or there is some unique facility provided
by
pdf it may make sense but for users who want information for an immediate
need they may not need lots of pictures. You probably don't want a tutorial
as you are getting ready to submit a form to trade securities, launch a
missile, or land a plane.

I found the SetOpenAction method.
It executes when the document opens. But it needs some parameters. It needs a
LocalDestination. And here i thought that the text searching would come in.
I was thinking that:
- get Property Name from the table
- search n. property n string in the PDF
- get the coordinates, location of the string occurrence

presumably the search is being done by the viewer or do you want itext to do
this?

- set the Open Action
would to the trick.

But can I do that? Any other suggestions or solutions would be welcomed.

Get a better idea of what the user is supposed to experience and write scripts
to parse
the source pdf into a suitable format unless pdf is the right choice.

Thank you for your time patience.

It is easier now than to find out later you have designed a system that I need
to use
to do something simple- many agencies have been sold on pdf and probably try to
do stuff
like this all the time. There are some very good pdf products generated each
week
by govt agencies that have pictorical information, but then there are many
submissions
from people forced to make public declarations that use pdf as a big way to
hide data from
automated processing etc etc etc.

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] iText Read Chuncks of PDF into java

2010-05-19 Thread Mike Marchywka

Date: Wed, 19 May 2010 08:52:00 +0200
From: i...@1t3xt.info
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] iText Read Chuncks of PDF into java

crimeunit wrote:
Dear all,

Does somebody else know maybe that I can use another library where I can
specially read out the links of content (to another pdf file) into a pdf?

Reading out the links is a completely different question.

Links (anchors, hyperlinks, external go to actions,...) are not part of
the page content stream; they are stored in Link annotations and very
easy to retrieve.

We just discusses this specific issue in another thread. The question
ultimately became, from my perspective, do you need to write
a custom piece of code to get links or can stand on the shoulder
of giants, avoid reinventing the wheel and solve the problem
with cliches and command line tools such as

cat xxx.pdf | grep http or better
cat xxx.pdf | convert_to_form_suited_for_manipulation | grep
$unambiguous_link_thingy all_links

Your problem is that you are not using the correct terminology,
therefore it is impossible for anybody to answer your question.

This of course is a very common problem when just starting out
and it makes it hard to do key word searches. A lot of your
time is spent here but this is hardly unique to itext.
A command line tool to dump a pdf in human readable form (LOL)
with the right jargon could make this easier ( I dumped the
pdf and all the wazoodalle dictionary entries were blank)

This is why I usually talk around ill-posed questions time
and interest permitting.

I interpreted your question as a request to do something that is
impossible: you want to extract structure from a PDF that isn't
structured (a PDF that isn't tagged).

You won't find any tool that can do that.

If you can convert the PDF to text or pixels or anyother
thing that may capture structure according rto
some external pattern you may be able to use
existing text tools or, if this is worth enough effort,
OCR tools on pixels.
My recurring complaint is the FDA does or
has in the past accepted scanned PDF files for documentation
of clinical trial results of approved drugs( look for example
at dr...@fda various doc packages) . This makes
it impossible for automated usage of this voluminous data
and I tried OCR but it didn't work too well.
Many people who file govt documents don't like
automated data processing which does make this format
a good choice. Calling this Accessdata is almost comical
perhaps accesspictures LOL.

http://www.accessdata.fda.gov/scripts/cder/drugsatfda/
http://www.accessdata.fda.gov/drugsatfda_docs/nda/2004/125104s000_Natalizumab_Pharmr_P1.pdf

I did note the labels seem to be selectable and preusmably you could get
data out of the cave drawings.

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] text searching + opening a document directly to the search result location

2010-05-18 Thread Mike Marchywka

Hello!

In my app. I have a table. On each row there is a help button. The help is
provided in the form of a large PDF file.

If the user presses the help button on a row, the PDF should open directly
where the explanation for that row properties is.

Can I do this with iText (actually iTextSharp)? Can I search the document
(using the property name on that row) and then open the document to the user,
directly at that location?

This is a bit like the prior question about , how do I use a servlet to deal
with big pdf files. The first answer
might be, why are you using PDF in this setting? Rather than having a help
button provoke a search,
wouldn't you be better off doing the search previously or can help point to
arbitrary locations?
In the former case, html with fragments may work or just having separate pages,
in the latter case a DB
may be more appropriate although I guess you could ask about PDF indexing or
TOC capabilities.
Leonard, you care to explain how PDF is a good choice here? Thanks.
My interest of course is that I end up having to use some of these creations
that people design...

Thanks,
Victor

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] iText causing thread stuck

2010-05-17 Thread Mike Marchywka












 Date: Mon, 17 May 2010 09:02:24 -0700
 From: msto...@autonomy.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] iText causing thread stuck

 I googled stuck executethread, and the discissions I turned up were
 about a fixed max thread run time that was configurable (and seems to
 default to 600 seconds, 10 minutes). Your self-tuning thing may be
 ignoring that in favor of something Fancy.

 I suspect you need to find a way to tell your server that this thread is
 going to take a long time, and that's okay... A WebLogic question, not
 an iText one.

 Now if you want to figure out how to speed up iText, that belongs
 here... But most of the efficiency improvements available are in the IO,
 while your thread is being halted in what looks like code involved in
 building your PdfTable in memory, not writing it out.

 You might throw in some logic to take any table of more than X rows and
 generate them in separate documents in series, such that you can get
 each part done reasonably quickly. Once all the portions of the table
 have been generated, you can stitch the PDFs back together, AT THE PAGE
 LEVEL. It is, for all practical purposes, impossible to extract rows
 and append them to existing documents. Pages or bust.

 --Mark Storer
 Senior Software Engineer
 Cardiff.com

 import legalese.Disclaimer;
 Disclaimer DisCard = null;



 -Original Message-
 From: stitches [mailto:sarifi...@sbcglobal.net]
 Sent: Friday, May 14, 2010 12:21 PM
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] iText causing thread stuck


 Hi - I'm relatively new to Java and iText. I took over a
 project for someone else, we're running it in Weblogic 10.3
 We have a dynamic report (dynamic in the sense that the
 columns and the order of columns are chosen by the end user,
 and the different columns can have different rowspans), which
 we would like to export to PDF. If the dataset is relatively
 small, we have no problems. But when the table is extremely
 large, the export hangs in a thread stuck. I really hope you
 can help me out, this is quite urgent.
 Below is the error we are receiving. Thanks in advance.

 Thread-36 [STUCK] ExecuteThread: '2' for queue:
 'weblogic.kernel.Default (self-tuning)' 
 priority=1, DAEMON {
 com.lowagie.text.pdf.PdfLine.getChunk(Unknown Source)
 com.lowagie.text.pdf.PdfCell.firstLineRealHeight(Unknown Source)
 com.lowagie.text.pdf.PdfCell.setBottom(Unknown Source)
 com.lowagie.text.pdf.PdfDocument.addPdfTable(Unknown Source)
 com.lowagie.text.pdf.PdfDocument.add(Unknown Source)
 com.lowagie.text.Document.add(Unknown Source)

 com.novartis.dra.tap.servlets.CustomPDFGenerator.doPost(Custom
 PDFGenerator.java:64)

 com.novartis.dra.tap.servlets.CustomPDFGenerator.doGet(CustomP
 DFGenerator.java:59)
 javax.servlet.http.HttpServlet.service(HttpServlet.java:700)
 javax.servlet.http.HttpServlet.service(HttpServlet.java:815)

 weblogic.servlet.internal.StubSecurityHelper$ServletServiceAct
 ion.run(StubSecurityHelper.java:224)

 weblogic.servlet.internal.StubSecurityHelper.invokeServlet(Stu
 bSecurityHelper.java:108)

 weblogic.servlet.internal.ServletStubImpl.execute(ServletStubI
 mpl.java:198)

 weblogic.servlet.internal.ServletStubImpl.execute(ServletStubI
 mpl.java:175)

 weblogic.servlet.internal.WebAppServletContext$ServletInvocati
 onAction.run(WebAppServletContext.java:3468)

 weblogic.security.acl.internal.AuthenticatedSubject.doAs(Authe
 nticatedSubject.java:308)
 weblogic.security.service.SecurityManager.runAs(Unknown Source)

 weblogic.servlet.internal.WebAppServletContext.securedExecute(
 WebAppServletContext.java:2116)

 weblogic.servlet.internal.WebAppServletContext.execute(WebAppS
 ervletContext.java:2038)

 weblogic.servlet.internal.ServletRequestImpl.run(ServletReques
 tImpl.java:1372)
 weblogic.work.ExecuteThread.execute(ExecuteThread.java:198)
 weblogic.work.ExecuteThread.run(ExecuteThread.java:165)
 }
 --

See some of the servlet lists. Servlets are meant to be diminuitive (lets)
things that run on a server to handle short tractable
requests while a remote connection stays open. Even if you
could keep the thread alive you may not maintain the
connection- what if request comes in over a wireless link or
two? You need to change the paradigm.






  
_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before

Re: [iText-questions] Bolded text is fuzzy in PDFs

2010-05-12 Thread Mike Marchywka





Just to emphasize the lack of pdf tools, and goad anyone who can into showing 
me I'm wrong,
let me illustrate how you could find some suspects on your own with well know 
tools.
Imagine what you could do if you could convert a pdf to a canonical or 
intermediate form
that played nice with decades or prior work LOL.
( these may get distorted by hotmail. hotmail of course things everything is 
html... ) 

Your earlier comments about truetype  are plausible with a quick scan of the 
two docs for
overt font refs,


 560  wget -O B -S -v http://www.windwardreports.com/temp/primf.pdf;
  561  wget -O G -S -v http://www.windwardreports.com/temp/primf2.pdf;
  562  strings G gs
  563  strings B Bs
  564  mv gs Gs
  565  more Gs
  566  more Bs
  567  more Bs | grep -i font


marchywka:/home/marchywka/junk# more Gs | grep -i font | sed -e 's/[^a-ZA-Z 
]//g' | sed -e 's/  */ /g'
 obj Contents R MediaBox Parent R Resources Font F R F R F R ProcSet PDF Text 
ImageB ImageC ImageI XObject Type Page endobj
 obj Contents R MediaBox Parent R Resources Font F R F R F R ProcSet PDF Text 
ImageB ImageC ImageI XObject Type Page endobj
 obj Contents R MediaBox Parent R Resources Font F R F R F R F R ProcSet PDF 
Text ImageB ImageC ImageI XObject Type Page endobj
 obj Annots R R Contents R MediaBox Parent R Resources Font F R F R F R ProcSet 
PDF Text ImageB ImageC ImageI XObject Type Page endobj
 obj BaseFont DIMJAQCalibri Encoding WinAnsiEncoding FirstChar FontDescriptor R 
LastChar Subtype TrueType Type Font Widths endobj
 obj BaseFont DKIIHVArialBold Encoding WinAnsiEncoding FirstChar FontDescriptor 
R LastChar Subtype TrueType Type Font Widths endobj
 obj BaseFont DWWVFAArial Encoding WinAnsiEncoding FirstChar FontDescriptor R 
LastChar Subtype TrueType Type Font Widths endobj
 obj BaseFont DOFDBOWingdings FirstChar FontDescriptor R LastChar Subtype 
TrueType Type Font Widths endobj
 obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName DIMJAQCalibri 
ItalicAngle StemV Type FontDescriptor endobj
 obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName 
DKIIHVArialBold ItalicAngle StemV Type FontDescriptor endobj
 obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName DWWVFAArial 
ItalicAngle StemV Type FontDescriptor endobj
 obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName 
DOFDBOWingdings ItalicAngle StemV Type FontDescriptor endobj


marchywka:/home/marchywka/junk# more Bs | grep -i font | sed -e 's/[^a-ZA-Z 
]//g' | sed -e 's/  */ /g'
Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB 
ImageC ImageIFontF RF RF RMediaBox 
Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB 
ImageC ImageIFontF RF RMediaBox 
Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB 
ImageC ImageIFontF RF RF RF RMediaBox 
Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB 
ImageC ImageIFontF RF RF RMediaBox Annots R
FontBBox CapHeight TypeFontDescriptorFontFile RStemV Descent Flags 
FontNameOZABETArialBoldMTAscent ItalicAngle 
BaseFontOZABETArialBoldMTCIDSystemInfoOrderingIdentityRegistryAdobeSupplement W 
TypeFontSubtypeCIDFontTypeFontDescriptor RDW CIDToGIDMapIdentity
DescendantFonts 
RBaseFontOZABETArialBoldMTTypeFontEncodingIdentityHSubtypeTypeToUnicode R
FontBBox CapHeight TypeFontDescriptorFontFile RStemV Descent Flags 
FontNameTOCCAPArialMTAscent ItalicAngle 
BaseFontTOCCAPArialMTCIDSystemInfoOrderingIdentityRegistryAdobeSupplement W 
TypeFontSubtypeCIDFontTypeFontDescriptor RDW CIDToGIDMapIdentity
DescendantFonts 
RBaseFontTOCCAPArialMTTypeFontEncodingIdentityHSubtypeTypeToUnicode R
BaseFontHelveticaTypeFontEncodingWinAnsiEncodingSubtypeType
BaseFontZapfDingbatsTypeFontSubtypeType
marchywka:/home/marchywka/junk# 



I've also got utilitities for building vocaulary lists and diffing them etc iif 
you want to find words in one list that
are missing in the other for example.


Mike Marchywka
1975 Village Round
Marietta GA 30064
415-264-8477 (w)- use this
404-788-1216 (C)- leave message
989-348-4796 (P)- emergency only
marchy...@hotmail.com
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.








 From: ad...@windward.net
 To: itext-questions@lists.sourceforge.net
 Date: Wed, 12 May 2010 10:40:42 -0600
 Subject: Re: [iText-questions] Bolded text is fuzzy in PDFs

 Hi Leonard,

 Here are links to both the pdf's.

 Bad - http://www.windwardreports.com/temp/primf.pdf
 Good - http://www.windwardreports.com/temp/primf2.pdf

 thanks

 -Original Message-
 From: itext-questions-requ...@lists.sourceforge.net 
 [mailto:itext-questions-requ...@lists.sourceforge.net]
 Sent: Wednesday, May 12, 2010 12:23 AM
 To: itext-questions@lists.sourceforge.net
 Subject: iText-questions Digest, Vol 48, Issue 33

 Send iText-questions mailing list

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-11 Thread Mike Marchywka

From: lrose...@adobe.com
To: itext-questions@lists.sourceforge.net
Date: Mon, 10 May 2010 19:01:15 -0700
Subject: Re: [iText-questions] how to detect remote links in a PDF ?

There is no such thing as canonical PDF - anything that complies with the
PDF specification is valid. That allows for various uses ofcompression,
ASCII encoding, etc.

Well, not really. If there are rules for the PDF standard then you could in
fact create some alternative representation- it could
be super big, verbose, complicated, etc but it may be a useful intermediate
form for various types of work
such as debug or adhoc editing where you don't want to waste time writing
custom code to do something simple.

No argument!

BUT an intermediate format (or an alternative format) and a canonical
format are VERY VERY different things...

Well, at least canonical would be something like pdf that doesn't do anything
fancy and has rules for otherwise
arbitary choices then you could do simple things like ASCII searches and maybe
binary diffs to test for pixel equality etc.

There are many folks who have developed alternative representations of PDF,
whether in XML or other formats, including Adobe ourselves. For example,
Adobe has a project codenamed Mars on our Labs site () which describes an
XML+ZIP-based representation of PDF. It supports all of the features of PDF
from PDF 1.7. We provide some tooling for Acrobat Reader, and you are
welcome to develop your own.

But again, that's NOT canonical - just alternative.

But that would work for the original purpose too. Maybe you should mention
these on itext somewhere and refer
people to them. It is hard to say you wil be accused of being biases any more
than you already are and
if the tools work who cares if you are biased? LOL .

From your terse descriptions, that even sounds like a sane and workable
approach, not what I would have expected ( sorry,
had to interject LOL).

This is also not irrelevant to itext implementation as a prior thread was
talking about optimizations at
an algorithm level If you had some attributes of a parsed or intermediate form
that make various
manipulations easy, it may be a good thing for itext to parse into or even
write out for other canned
( itext based or not ) tools to use.

cat pdf | itext_parse_to_intmediate_form | my_itext_tool | intermediate_to_pdf
-O3 new.pdf

Piping can be slow but obviously you can start mashing tools together etc.

That's why library such as iText exist - to provide you with higher level
APIs (where possible). They are what one would use to create
automated test tools, validators, etc. And many such tools already do exist
- so it's definitely doable (and has been done).

If you took that attitude you couldn't even hide behind but pdf is a
standard since then the argument is well I have API
xyz and we can do anything with it. if you use my ABC format I guess having
a list would help, is there a pdf
developer download somewhere with tools like this?

Adobe Acrobat Professional includes a PDF validator feature as part of its
Preflight module, and has since version 7. It is the only publicly available
validator that I am aware of, though I have spoken to at least a half-dozen
commercial PDF vendors that have told me that they have developed their own
validators for their own use.

There used to be two limited open source validators - JHOVE () and
Multivalent (). But to my knowledge, neither is currently supported/updated.
Since both were Java-based OSS, I would think you could pick them up and run
with them if you wished.

Ok, sounds like reasonable starting points.

I'm not saying it is trivial to do any of this, but it does seem much of the
traffic here never gets
referred to any simple diagnostics.

Leonard

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask

Re: [iText-questions] What action is requred in terms of License

2010-05-11 Thread Mike Marchywka

Date: Tue, 11 May 2010 15:49:47 +0200
From: i...@1t3xt.info
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] What action is requred in terms of License

Michael Olenick wrote:
People -- Call or write to Bruno's sales agent.

Thanks!

I've had a speed course in economics at http://www.vlerick.com/ last
year, discovering there's more to the IT business than writing code.

It's all about the business model, not just the business model used for
iText, but also the business model of the end user/developer.

I think many of the issues we come up against are actually
due to thinking about business before technological issues.
I won't mention any names, but if you know of any large publically
traded companies that derive significant revenue due to PDF they
then this may or may not relate to those entities.

Walled gardens have most recently been tried by cell phone
companies and if you search SEC filings for such terms,
there as of late has been a recognition that they are bad
for business- users and developers get mad.

So, to the extent business comes before making useful
products, something to consider.

We all want to make money but creating artificial barriers
and designing products that lock people into fixed
vendors or ways of thinking is rarely helpful.

_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Mike Marchywka












 Date: Sun, 9 May 2010 23:08:51 +0200
 From: papa...@googlemail.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] how to detect remote links in a PDF ?

 Colleagues,

 For an application, one needs to detect the hyperlinks (i.e. done with
 Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone
 point me to a solution ?

Question for leonard or others who have read the spec, if you literally ONLY
want to list the links, not parse the document or determine any context,
 are they likely to be hidden or can you just use text
tools to find strings that start or contain http ? For example,


  540  cat *.pdf ../Desktop/*.pdf  | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep 
http
  541  cat *.pdf ../Desktop/*.pdf  | strings | grep http
  542  history

These seem to work in that they find things with http but not sure what would be
missing. Many of these seem to be surrounded by xml or prefixed with /A 
but not sure what other contexts may exist.

Thanks.







 Thank you very much in advance,
 Pieter Vankeerberghen

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Mike Marchywka

 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 06:44:13 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 Prior to PDF 1.5, you could have done a grep (or equivalent) since only 
 stream objects were compressed. However, as of PDF 1.5, we now have object 
 streams, where groups of objects are placed into a stream and then 
 compressed - which means that grep will no longer work.

 Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, 
 such as PDF/A) use object stream compression to keep file sizes down. I've 
 been trying to recommend that other products do the same.

Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in 
it to some Standard or 
canonical format that can let it be used with other tools so you don't have to 
write custom code for
every little trivail variation of a thing you wish to accopmlish? For example,

cat xxx.pdf | pdf_to_standard_form | grep http 

Obivously applicability would go beyond the immediate question but also let 
people writing itext
code have some way to check their results more easily than it opened in 
proprietary adobe product X
but in black box Y it greyed out 3 menu options and wouldn't let me save it 
unless blah blah bla ?

There is nothing wrong with a human readable end product but given the 
complexity of these things
it would be nice to use computers to automate certain things, like checking for 
links
or other attributes. Without ability to use automated tools everything comes 
down to a long
menu chain and terse messages from products not designed for debug.

 So while there certainly exists lots of PDFs that you could grep, the numbers 
 are reducing daily...

 Leonard

 -Original Message-
 From: Mike Marchywka [mailto:marchy...@hotmail.com]
 Sent: Monday, May 10, 2010 3:51 AM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 Date: Sun, 9 May 2010 23:08:51 +0200
 From: papa...@googlemail.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] how to detect remote links in a PDF ?

 Colleagues,

 For an application, one needs to detect the hyperlinks (i.e. done with
 Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone
 point me to a solution ?

 Question for leonard or others who have read the spec, if you literally ONLY
 want to list the links, not parse the document or determine any context,
  are they likely to be hidden or can you just use text
 tools to find strings that start or contain http ? For example,

   540  cat *.pdf ../Desktop/*.pdf  | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep 
 http
   541  cat *.pdf ../Desktop/*.pdf  | strings | grep http
   542  history

 These seem to work in that they find things with http but not sure what would 
 be
 missing. Many of these seem to be surrounded by xml or prefixed with /A
 but not sure what other contexts may exist.

 Thanks.

 Thank you very much in advance,
 Pieter Vankeerberghen

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

 _
 The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
 Hotmail.
 http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Mike Marchywka














 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 18:09:15 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 There is no such thing as canonical PDF - anything that complies with the 
 PDF specification is valid. That allows for various uses of compression, 
 ASCII encoding, etc.

 There are certainly tools out there that will uncompress/defilter all the 
 elements in the PDF so that it is plain text and can be searched using 
 text-only tools - though certainly that wouldn't help you for modifications 
 (for obvious reasons).

Well, not really. If there are rules for the PDF standard then you could in 
fact create some alternative representation- it could
be super big, verbose, complicated, etc but it may be a useful intermediate 
form for various types of work
such as debug or adhoc editing where you don't want to waste time writing 
custom code to do something
simple. XXX Intermediate Form is a very common file format :) I guess you 
could imagine expanding 
it to some XML format where you have decompressed the text and done something 
with the images, fonts, and formatting
information- no idea what. Essentially your claim is that PDF is so bizarre, 
unique, superlative,  and singular, nothing can possibly
equal it :) I just downloaded some schematic capture programs and those create 
documents that are inherently graphical-
schematics- but the essential features can be easily extracted as concise text 
netlists. 


 That's why library such as iText exist - to provide you with higher level 
 APIs (where possible). They are what one would use to create automated test 
 tools, validators, etc. And many such tools already do exist - so it's 
 definitely doable (and has been done).

If you took that attitude you couldn't even hide behind but pdf is a standard 
since then the argument is  well I have API
xyz and we can do anything with it. if you use my ABC format  I guess having a 
list would help, is there a pdf
developer download somewhere with tools like this? This reminds me of when I 
first got here and you explained
logical structure was available but everytimei it comes up in a concrete rather 
than hypothetical case
everyone says, Sure you could preserve strcuture but it is too copmlicated to 
be practical.  In the present
case, you say the tools exist but when someone shows up with an error from 
acrobat no one can point to a
tool to check the pdf. 



 And let us not forget the expression - just because you only have a hammer, 
 doesn't mean everything is a nail!

That's fine if you have a list of tools somewhere but I keep seeing the same 
hammer being used, usually
an Acrobate reader with the informative diagnostics your pdf is damaged. 
Again, I'm not saying this
is a fault with ADBE or pdf, but it would be nice to refer people to some list 
of tools that give a better
diagnostic. In many cases of course all you really care about is the text and 
the hammer gets almost everything
done. When you need the graphics that is a different situation. 

So ok I've only got one swiss army knife LOL.



 Leonard

 -Original Message-
 From: Mike Marchywka [mailto:marchy...@hotmail.com]
 Sent: Monday, May 10, 2010 6:02 PM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?







 
 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 06:44:13 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 Prior to PDF 1.5, you could have done a grep (or equivalent) since only 
 stream objects were compressed. However, as of PDF 1.5, we now have object 
 streams, where groups of objects are placed into a stream and then 
 compressed - which means that grep will no longer work.

 Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, 
 such as PDF/A) use object stream compression to keep file sizes down. I've 
 been trying to recommend that other products do the same.


 Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in 
 it to some Standard or
 canonical format that can let it be used with other tools so you don't have 
 to write custom code for
 every little trivail variation of a thing you wish to accopmlish? For example,

 cat xxx.pdf | pdf_to_standard_form | grep http


 Obivously applicability would go beyond the immediate question but also let 
 people writing itext
 code have some way to check their results more easily than it opened in 
 proprietary adobe product X
 but in black box Y it greyed out 3 menu options and wouldn't let me save it 
 unless blah blah bla ?

 There is nothing wrong with a human readable end product but given the 
 complexity of these things
 it would be nice to use computers to automate certain things, like checking 
 for links
 or other

Re: [iText-questions] Guidance Requested - Generating multipage output with header/footer and pg 1 layout

2010-05-05 Thread Mike Marchywka

Where have you learned this?
Who is spreading this kind of desinformation?
Why are you saying this.

In attachment, you can find a very simple example that proves
the opposite of what you've learned.

And more importantly: what is wrong with the documentation???

What is wrong with the second edition of the book that still causes
misconceptions like this? There are still a couple of months left to
find a remedy before the book goes to print.

Along the lines of your comments, I would guess the
problem would be someone has to read it. Now, before
you dismiss this as a pointless joke, I would just
consider it as Stating the obvious as this
often is a route to great discovery ( you needn't pat
me on the back at this point, I am doing so now).
Stating another obvious, we have computers.
With appropriately formetted electronic publications,
we have tools like grep to help us find what we need
with efficient use of resources. However, often
there is a problem with vocabulary and document
structure for the beginner wishing to become
familiar with a topic. Keywords don't help
if you don't know what they are, if there is
not document structure common word context
is hard to find. So, you can create neologisms
to make searching easier or in the electronic
docs create structure. fwiw.
While Leonard and others keep pointing out
that PDF has structural capabilities, everytime
someone asks here a question that
lends itself to use of these facilities, almost
unanimuous opinion is, sure its
possible but it is too complicated or
hard and no one would use it,
I can grep javadocs and have some idea
of context since the rendered html is
fairly uniform if not intended to be structured.
I can build my own indexes and remove common
words, reducing the time to learn
jargon etc.

_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Performance when flattening form fields

2010-04-26 Thread Mike Marchywka













 Date: Sun, 25 Apr 2010 22:14:02 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Performance when flattening form fields


 After more digging, I'm wondering if the place to do this wouldn't be in the
 PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do
 the same flattening operation that PdfStamper does.

 The ideal would be to factor out the behavior so the code isn't duplicated
 in both PdfCopy and PdfStamper...

I guess I have the larger question of exactly what parsing is?
That is, it seem generally you use itext to 1) read in somthing, often
an existing pdf, 2) do some stuff, then 3) write out a pdf. Presumably
as you go through step 2, you are assembling or compiling a bunch
of structures that allow you to do step 3 but are more optimized
for manipulation and editing the nascent PDF. 
If I understand your earlier comments, you apparently don't actually
have a generic PDF parser to do step 1 that works with all sequences
you could put into step 2. Now, of course, more generally the
above approach doesn't scale as you would always hope to stream
to some extent- read what you need, write what you can etc. 
However, that could probably be hidden somewhat into the implementation
for classes for each step. 
So, instead of things like PdfCoolFeature.doSomething(byte[] pdffile)
you have PdfCoolFeadture.doSomething( ParsedPdfOperand pdflikething)
where the second signature take a parameter that is generally
optimized for a broad class of common operations. 
 

 Does anyone see any technical issues with this as a strategy?

 - K


 'Kevin Day' wrote:








 I've been doing some digging into the performance question that Giovanni
 Azua has posted about.
  
 Some of his findings (using StringBuilder, etc...) are solid improvements
 to overall iText performance - however, the crux of the performance
 difference he is seeing between iText and the competing solution is not
 low level.  It's a high level issue.
  
 Here's what's going on:
  
 His specific use case involves stamping headers and footers onto
 pages.  The footer contains AcroFields that must be flattened prior
 to stamping.
  
 The performance hit is coming from the fact that, in order to flatten and
 apply the footer, he is having to:
  
 1.  Construct a PDF using PdfStamper
 2.  Write output to a byte array output stream
 3.  Re-parse the BAOS into a PdfReader
 4.  Import the page from the reader for use as a stamp
  
 While this is functional, it is certainly not performant.
  
 A much, much faster technique would be to do the flattening to the
 *reader*, then just import the page to the output writer.  This
 avoids the awkward creation of the temporary PdfReader.
  
  
 So, the performance delta is not caused so much by iText's low level
 implementation (although the performance improvements that Giovanni has
 suggested will help to make iText even faster than it already is) - the
 delta is really caused by an awkward operation forced on the user by the
 framework.
  
  
 So, are there any fundamental reasons to not do flattening, etc... to the
 PdfReader?  My first look at the code indicates that it may be
 possible to factor this out of PdfStamper (basically, instead of adjusting
 the AcroFields dictionary and content streams in the
 PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the
 PdfReader).
  
 I'm thinking of something along the lines of:
  
 PdfFormFlattener(PdfReader).flatten(pageNumber)
  
 Maybe with supplemental methods for flattenNamedFields(pageNumber),
 flattenFieldsOfType(pageNumber)
  
 Thoughts?
  
 - K
  



 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/


 --
 View this message in context: 
 http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html
 Sent from the iText - General mailing list archive at Nabble.com.


 --
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/  
   
_
The New Busy is not the

Re: [iText-questions] Performance when flattening form fields

2010-04-26 Thread Mike Marchywka














 Date: Mon, 26 Apr 2010 08:14:32 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Performance when flattening form fields


 Mike - can we please reserve this thread for a technical discussion of the
 merits of the proposal?

 I'd be happy to have a conversation in a separate thread regarding how iText
 works.

The merits of the proposal seem to relate to how itext works no?
That is, you are talking about problems reducing some thing
to a pdf file solely to read it back in and reparse it. If you
didn't have to write out a pdf file, if you could pass around
the internal thing you seem to save some time. Isn't that what
you are proposing to attack?
 
 
 
 

 - K


 Mike Marchywka-2 wrote:












 
 Date: Sun, 25 Apr 2010 22:14:02 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Performance when flattening form fields


 After more digging, I'm wondering if the place to do this wouldn't be in
 the
 PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do
 the same flattening operation that PdfStamper does.

 The ideal would be to factor out the behavior so the code isn't
 duplicated
 in both PdfCopy and PdfStamper...

 I guess I have the larger question of exactly what parsing is?
 That is, it seem generally you use itext to 1) read in somthing, often
 an existing pdf, 2) do some stuff, then 3) write out a pdf. Presumably
 as you go through step 2, you are assembling or compiling a bunch
 of structures that allow you to do step 3 but are more optimized
 for manipulation and editing the nascent PDF.
 If I understand your earlier comments, you apparently don't actually
 have a generic PDF parser to do step 1 that works with all sequences
 you could put into step 2. Now, of course, more generally the
 above approach doesn't scale as you would always hope to stream
 to some extent- read what you need, write what you can etc.
 However, that could probably be hidden somewhat into the implementation
 for classes for each step.
 So, instead of things like PdfCoolFeature.doSomething(byte[] pdffile)
 you have PdfCoolFeadture.doSomething( ParsedPdfOperand pdflikething)
 where the second signature take a parameter that is generally
 optimized for a broad class of common operations.


 Does anyone see any technical issues with this as a strategy?

 - K


 'Kevin Day' wrote:








 I've been doing some digging into the performance question that Giovanni
 Azua has posted about.

 Some of his findings (using StringBuilder, etc...) are solid
 improvements
 to overall iText performance - however, the crux of the performance
 difference he is seeing between iText and the competing solution is not
 low level. It's a high level issue.

 Here's what's going on:

 His specific use case involves stamping headers and footers onto
 pages. The footer contains AcroFields that must be flattened prior
 to stamping.

 The performance hit is coming from the fact that, in order to flatten
 and
 apply the footer, he is having to:

 1. Construct a PDF using PdfStamper
 2. Write output to a byte array output stream
 3. Re-parse the BAOS into a PdfReader
 4. Import the page from the reader for use as a stamp

 While this is functional, it is certainly not performant.

 A much, much faster technique would be to do the flattening to the
 *reader*, then just import the page to the output writer. This
 avoids the awkward creation of the temporary PdfReader.


 So, the performance delta is not caused so much by iText's low level
 implementation (although the performance improvements that Giovanni has
 suggested will help to make iText even faster than it already is) - the
 delta is really caused by an awkward operation forced on the user by the
 framework.


 So, are there any fundamental reasons to not do flattening, etc... to
 the
 PdfReader? My first look at the code indicates that it may be
 possible to factor this out of PdfStamper (basically, instead of
 adjusting
 the AcroFields dictionary and content streams in the
 PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the
 PdfReader).

 I'm thinking of something along the lines of:

 PdfFormFlattener(PdfReader).flatten(pageNumber)

 Maybe with supplemental methods for flattenNamedFields(pageNumber),
 flattenFieldsOfType(pageNumber)

 Thoughts?

 - K




 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/


 --
 View

Re: [iText-questions] Performance when flattening form fields

2010-04-25 Thread Mike Marchywka

Date: Sun, 25 Apr 2010 10:58:06 -0700
From: ke...@trumpetinc.com
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] Performance when flattening form fields

I've been doing some digging into the performance question that Giovanni Azua
has posted about.

Some of his findings (using StringBuilder, etc...) are solid improvements to
overall iText performance - however, the crux of the performance difference
he is seeing between iText and the competing solution is not low level. It's
a high level issue.

Here's what's going on:

His specific use case involves stamping headers and footers onto pages. The
footer contains AcroFields that must be flattened prior to stamping.

The performance hit is coming from the fact that, in order to flatten and
apply the footer, he is having to:

1. Construct a PDF using PdfStamper

2. Write output to a byte array output stream

3. Re-parse the BAOS into a PdfReader

4. Import the page from the reader for use as a stamp

While this is functional, it is certainly not performant.

A much, much faster technique would be to do the flattening to the *reader*,
then just import the page to the output writer. This avoids the awkward
creation of the temporary PdfReader.

So, there is no internal representation of a pdf doc you can pass around
without converting to a file format? If I understand you, you are
saying that he is forced to convert a bunch of structures into
a pdf file just so he can re-parse this file back into an
internal set of structures for further work?
How do you know the other package doesn't have to do this?

Is this only an issue with flattenning or is that just the specific
problem here but other operations may encounter simmilar problems?

So, the performance delta is not caused so much by iText's low level
implementation (although the performance improvements that Giovanni has
suggested will help to make iText even faster than it already is) - the delta
is really caused by an awkward operation forced on the user by the framework.

So, are there any fundamental reasons to not do flattening, etc... to the
PdfReader? My first look at the code indicates that it may be possible to
factor this out of PdfStamper (basically, instead of adjusting the AcroFields
dictionary and content streams in the PdfStamper/PdfCopy/etc... output, we'd
make those adjustments to the PdfReader).

I'm thinking of something along the lines of:

PdfFormFlattener(PdfReader).flatten(pageNumber)

Maybe with supplemental methods for flattenNamedFields(pageNumber),
flattenFieldsOfType(pageNumber)

Thoughts?

- K

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] performance follow up

2010-04-24 Thread Mike Marchywka












 From: brave...@gmail.com
 Date: Sat, 24 Apr 2010 13:05:26 +0200
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance follow up



 Hello,

 On Apr 23, 2010, at 10:50 PM, trumpetinc wrote:
 Don't know if it'll make any difference, but the way you are reading the file
 is horribly inefficient. If the code you wrote is part of your test times,
 you might want to re-try, but using this instead (I'm just tossing this
 together - there might be type-os):

 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 byte[] buf = new byte[8092];
 int n;
 while ((n = is.read(buf))= 0) {
 baos.write(buf, 0, n);
 }
 return baos.toByteArray();

 I tried your suggestion above and made no significative difference compared 
 to doing the loading from iText. The fastest I could get my use case to work 
 using this pre-loading concept was by loading the whole file in one shot 
 using the code below.

If as indicated below you are generally IO limited, don't throw
the code out yet. If you must copy data you want to use array
based methods as often as possible but the first preference
is to avoid copies unless of course you are strategicly
preloading or something. 
 
I often just turn everything into a byte array but
obviously this doesn't scale too well unless you are content to let
VM do your swapping for you. Ideally you would just load what you
need in a just-in-time fashion to avoid tying up idle RAM. 
 

 Applying the cumulative patch plus preloading the whole PDF using the code 
 below, my original test-case now performs 7.74% faster than before, roughly 
 22% away from competitor now ...

 btw the average response time numbers I was getting:

 - average response time of 77ms original unchanged test-case from the office 
 multi-processor-multi-core workstation
 - average response time of 15ms original unchanged test-case from home using 
 my MBP

 I attribute the huge difference between those two similar experiments mainly 
 to having an SSD drive in my MBP ... the top Host spots reported from the 
 profiler are related one way or another to IO so would be no wonder that with 
 an SSD drive the response time improves by a factor of 5x. There are other 
 differences though e.g. OS, JVM version.

 
Multi-proc and disk cache can cause some confusions. I wouldn't ignore
task manager for some initial investigations- if the CPU drops and disk
light comes on you are likely to be disk limited. With IO it is easy
to get nickel-and-dimed to death as everyone who relays the data
can be low on profile chart but it adds up. Wall-clock times are least
susceptible to manipulation and may be best for A-B comparisons 
if you have control over other stuff running on machine ( cash flow versus 
pro-forma earnings LOL). If you can subclass
the random access file thing you may be able to first collect statistics
and then write something that can see into the future a few milliseconds.
All the generic caches work on past results, things like MRU except maybe the 
prefetch
which assumes you will continue to do sequential memory accesses. If you
are in a posittion to make forward looking statements that have a material 
impact on your performance you ( ROFL) you may be able to 
do much better.
 
 

 Best regards,
 Giovanni

 private static byte[] file2ByteArray(String filePath) throws Exception {
 InputStream input = null;
 try {
 File file = new File(filePath);
 input = new BufferedInputStream(new FileInputStream(filePath));


 byte[] buff = new byte[(int) file.length()];
 input.read(buff);

 return buff;
 }
 finally {
 if (input != null) {
 input.close();
 }
 }
 }

 
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] performance follow up

2010-04-24 Thread Mike Marchywka



Isn't there something in PDF about linearization? ( the term
comes up as a suggestion on google, LOL). How
can you compare the two resulting pdf's in terms of 
dynamic attributes or arbitrary ordering or some items- given issues with IO 
and access
patterns this could be an issue. In fact, you could
even imagine that if you could reorder somethings
you get win-win for creation and future rendering time.
 
What is the extent of the freedom here? It sounds like
any hints you would generate for reader could be used
during document manipulation in itext. 







 Date: Sat, 24 Apr 2010 11:59:14 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance follow up


 If the file is being entirely pre-loaded, then I doubt that IO blocking is a
 significant contributing factor to your test.

 I think that the best clue here may be the difference between performance
 with form flattening and without form flattening. Just to confirm, am I
 right in saying that iText outperforms the competitor by a significant
 amount in the non-flattening scenario? If that's the case, then it seems
 like we should see significant differences in the profiling results between
 the flattening and non-flattening scenarios in iText.

 Would you be willing to post the profiling results for both cases so we can
 see which code paths are consuming the most runtime in each?

 Another possibility if the profiling results show similar hotspots is that
 the form flattening algorithms in iText are using the hotspot areas a lot
 more than in the non-flattening case. There may be a bunch of redundant
 reads or something in the flattening case.

 Let's take a look at the profiling results and see if we can draw any
 conclusions about where to go next.

 BTW - which profiler are you using? Are you able to expand each of the
 hotspot code paths and see the actual call path that is causing the
 bottleneck? I use jvvm, and the results of expanding the hotspot call trees
 can be quite illuminating.

 What I really would like is to get ahold of your two benchmark tests (with
 and without flattening) so I can run it on my system - do you have anything
 you can package up and share?

 - K


 Giovanni Azua-2 wrote:

 Hello,

 On Apr 23, 2010, at 10:50 PM, trumpetinc wrote:
 Don't know if it'll make any difference, but the way you are reading the
 file
 is horribly inefficient. If the code you wrote is part of your test
 times,
 you might want to re-try, but using this instead (I'm just tossing this
 together - there might be type-os):

 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 byte[] buf = new byte[8092];
 int n;
 while ((n = is.read(buf))= 0) {
 baos.write(buf, 0, n);
 }
 return baos.toByteArray();

 I tried your suggestion above and made no significative difference
 compared to doing the loading from iText. The fastest I could get my use
 case to work using this pre-loading concept was by loading the whole file
 in one shot using the code below.

 Applying the cumulative patch plus preloading the whole PDF using the code
 below, my original test-case now performs 7.74% faster than before,
 roughly 22% away from competitor now ...

 btw the average response time numbers I was getting:

 - average response time of 77ms original unchanged test-case from the
 office multi-processor-multi-core workstation
 - average response time of 15ms original unchanged test-case from home
 using my MBP

 I attribute the huge difference between those two similar experiments
 mainly to having an SSD drive in my MBP ... the top Host spots reported
 from the profiler are related one way or another to IO so would be no
 wonder that with an SSD drive the response time improves by a factor of
 5x. There are other differences though e.g. OS, JVM version.

 Best regards,
 Giovanni

 private static byte[] file2ByteArray(String filePath) throws Exception {
 InputStream input = null;
 try {
 File file = new File(filePath);
 input = new BufferedInputStream(new FileInputStream(filePath));

 byte[] buff = new byte[(int) file.length()];
 input.read(buff);

 return buff;
 }
 finally {
 if (input != null) {
 input.close();
 }
 }
 }



 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/


 --
 View this message in context: 
 http://old.nabble.com/performance-follow-up-tp28322800p28352147.html
 Sent from the iText - General mailing list archive at Nabble.com.


 --

Re: [iText-questions] performance follow up

2010-04-23 Thread Mike Marchywka

From: brave...@gmail.com
Date: Fri, 23 Apr 2010 12:23:50 +0200
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] performance follow up

Hello Mike,

On Apr 23, 2010, at 12:55 AM, Mike Marchywka wrote:

Mark Twain gets to the front so quickly. Again, I'm not suggesting
you did anything wrong or bad, I haven't actually checked numbers
or given the specific test a lot of thought- 9 data points is usually
not all that conclusive in any case and I guess that's my point.

There are 10 means, each mean comes from 1K data points, so there are 10K
data points for each version tested, not just 9

I thought you had 9 test cases but 9 or 10 doesn't matter much.

Unlike other tests of significance, t-test doesn't need a large number of
observations. It is actually this case of few observations e.g. 10

yeah, personally I've never liked nonparametrics and other approaches
that magically work with a few samples. Generally they treat the
small sample cases by dealing with outliers ( outliars LOL? ) more
gracefully. However, if you make some assumptions about population
statistics and run monte carlo you can see how often your 9 or 10 points
with your chosen test lead to misleading results. This is a bit of
an inverse problem- you have data with contributions from many sources
( ie noise ) and you are trying to estimate some underlying clean
number- known approaches to this have limits.

means one of its main use-cases. Indeed one would need to check the
assumptions of independence and normality. Looking at the response times

Cache warmness could lead to lots of run-to-run dependence depending
on what you measure and I
know personally I see this with second invokation of various command
line programs. If as you suggest the distro is normal maybe it is
just a bunch of random junk but sometimes you see multi-modal
and each peak is probably due to something interesting.

In any case, the point here is to make the code faster and use
the stats or whatever other meaures you have to point the way.

Best regards,
Giovanni
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

_
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with
Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] performance follow up

2010-04-23 Thread Mike Marchywka












 From: brave...@gmail.com
 Date: Fri, 23 Apr 2010 12:55:03 +0200
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance follow up


 On Apr 22, 2010, at 11:18 PM, trumpetinc wrote:

 I like your approach! A simple if (ch 32) return false; at the very top
 would give the most bang for the least effort (if you do go the bitmask
 route, be sure to include unit tests!).


 Doing this change spares approximately two seconds out of the full workload 
 so now shows 8s instead of 10s and isWhitespace stays at 1%.

 The numbers below include two extra changes: the one from trumpetinc above 
 and migrating all StringBuffer references to use instead StringBuilder.

 The top are now:

 PRTokeniser.nextToken 8% 77s 19'268'000 invocations
 RandomAccessFileOrArray.read 6% 53s 149'047'680 invocations
 MappedRandomAccessFile.read 3% 26s 61'065'680 invocations
 PdfReader.removeUnusedCode 1% 15s 6000 invocations
 PdfEncodings.convertToBytes 1% 15s 5'296'207 invocations
 PRTokeniser.nextValidToken 1% 12s 9'862'000 invocations
 PdfReader.readPRObject 1% 10s 5'974'000 invocations
 ByteBuffer.append(char) 1% 10s 19'379'382 invocations
 PRTokeniser.backOnePosition 1% 10s 17'574'000 invocations
 PRTokeniser.isWhitespace 1% 8s 35'622'000 invocations

 A bit further down there is ByteBuffer.append_i that often needs to 
 reallocate and do an array copy thus the expensive ByBuffer.append(char) 
 above ... I am playing right now with bigger initial sizes e.g. 512 instead 
 of 127 ...

I had a draft message I never sent regarding this. Essentially don't
call append, find the end then call String(byte[],offset,length)
or better ( this gets involved ) don't make temp strings just
pass around indexes ( you need to give this some thought, but my
post was getting quite confusing so I scrapped it).
 
 
 
 
 

 Best regards,
 Giovanni
 --
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/  
   
_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] performance follow up

2010-04-23 Thread Mike Marchywka

this is draft I mentioned earlier, it was getting a bit
convoluted due to over qualifiying each assertion
but if you are using append's a lot, consider the basic
idea of finding the delims FIRSt then doing one or
more array ops or avoiding string creating altogether.
I don't have any idea what you are doing with these
strings you parse but if building dictionaries, consider
things like the following.
On large dictionaries with coherent access patterns
, hash tables may not be as efficient as sorted things
with the right indexing ( this may not be apparent until
you start VM thrashing but if you have ordered queries on
static dictionaries, a sparse hash can make a mess of a
cache compared to a well thought out b-search on
a compact representation of your strings). I'm not
entirely sure the multi-pass approach I try to
outline below has a lot of merit but you would
need to consider some issues along these lines.

From: brave...@gmail.com
Date: Fri, 23 Apr 2010 12:27:42 +0200
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] performance follow up

Hello Paulo,

On Apr 22, 2010, at 11:43 PM, Paulo Soares wrote:
FYI I already use a table to map the char to the result for the delimiter
testing and the speed improvement was zero in relation to plain comparisons.

Paulo

You are right ... changing to a table makes no difference. I checked this
with the profiler and the results stay the same.

Why does that method take an int param vs char or better a byte?
Implicit casts are not normally free, probably look up table
needs to convert array index to int anyway but if you are
doing specific booleans comparing byte to byte you may be able
to avoid some JVM junk. In any case, the method code could
hide that if needed at all.

As should be clear, I'm not familiar with the code and
don't have it in from of me but a few thoughts.
Often reordering operations can help but it may not
be obvious a priori which approach is best.
Multiple passes are generally bad compared to working
on blocks that preserve locality and maximize low level
memory cache hits. However, due to other
issues it coud make sense, or at least multiple
passes in small blocks.
You could consider inlining this method in one place along
with any similar ones
and making a classification pass during which you scan each char in your
input data and create a class for it. Then make a second pass
through your now huge data in which each char is followed by its
class and then have processing based on a big switch statement
that switches on the class and whatever state info you have made.
Or, consider building a table of whitespace locations on your
first pass etc etc. If you are currently going through calling something
like an append(char) method on each char, you may be better off finding limits
and creating a new string with String(byte[]. offset, length) etc.

Also, presumably you find token limits and then make strings,
it is possible to avoid creating strings at all and just pass
around indexes into a byte array? This may require massive code
changes all over and depending on what you do with the strings may or may not
help much as many common operations may be expected to be opimitzed
in native code for strings. However, If you have huge hash tables each look up
may be cheap
to compute but each one also trashes the memory cache. You may be
better off with ordered index structs that you can implement in java
with byte[] more easily than strings.
And, of course, don't ignore obvious data dependent optimizations.
If you have strings with long common prefix like, http://www then removing this
from compares could be a big help with memory and speed.

Best regards,
Giovanni
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] performance follow up

2010-04-23 Thread Mike Marchywka










 Date: Fri, 23 Apr 2010 09:43:08 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance follow up


 Yes - it needs to be int. Regardless, we need to focus on the things that

 
So you are doing everything internally with 32 bit chars?
Not a big deal but if these are mostly zero there may be 
better ways to represent and save memory. You may say, well
RAM is cheap but that doesn't matter since low level caches
are fixed but I guess you can get a bigger disk and say VM is unlimited.
 
 
 
 
 are actually consuming run time, and this method isn't one of them (no
 matter how much it could be optimized).

The only person with data claimed otherwise :)
 
 
 


 Mike Marchywka-2 wrote:




 does this have to be int vs char or byte? I think earlier I suggested
 operating on byte[] instead of making a bunch of temp strings
 but I don't know the context well enough to know if this makes sense.
 Certainly demorgan can help but casts and calls are not free either.

 Also, maybe hotspot runtime has gotten better but I have found in
 the past that look up tables can quickly become competititve
 with bit operators ( if your param is byte instead of int, a
 256 entry table can tell you if the byte is a member of which classes).




 --
 View this message in context: 
 http://old.nabble.com/performance-follow-up-tp28322800p28343789.html
 Sent from the iText - General mailing list archive at Nabble.com.


 --
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/  
   
_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] performance follow up

2010-04-23 Thread Mike Marchywka












 Date: Fri, 23 Apr 2010 10:29:43 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance follow up


 This tells us that the focus needs to be on PRTokeniser and RAFOA. For what
 it's worth, these have also shown up as the bottlenecks in profiling I've
 done in the parser package (not too surprising).

 I'll discuss each in turn:

 RandomAccessFileOrArray - I've done a lot of thinking on this one in the
 past. This is an interesting class that can have massive impact on
 performance, depending on how you use it. For example, if you load your
 source PDF entirely into memory, and pass the bytes into RAFOA, it will
 remove IO bottlenecks.


I mentioned IO early on as a neglected task where you just pull some
generic thing out and let it hand you a byte at a time, clearly
if you know your access pattern and can make it coherent you stand
to gain a lot. Caching and VM can only guess what you will do next,
you may know better :)
 
 
 
 The one problem with the memory mapped strategy (in it's current
 implementation) is that really big source PDFs still can't be loaded into
 memory. This could be addressed by using a paging strategy on the mapped

You can have alt implementations in the mean time if you know
size a priori. Ideally you would
like to be able to operate on a stream and scrap random access.
 
 
 will use memory mapped IO is the Document.plainRandomAccess static public
 variable (shudder).

As if 2 people would use this at the same time ROFL :)
Its bad enough you don't have globals...
 
 




 So what about the code paths in PRTokeniser.nextToken()?

 We've got a number of tight loops reading individual characters from the
 RAFOA. If the backing source isn't buffered, this would be a problem, but I
 don't know that is really the issue here (it would be worth measuring
 though...)

 The StringBuffer could definitely be replaced with a StringBuilder, and it
 could be re-used instead of re-allocating for each call to nextTokeen()
 (this would probably help quite a bit, as I'll bet the default size of the
 backing buffer has to keep growing during dictionary reads).

 
Again, why even do this? Find the delimeters and, if you must make
a string make it only when you know start and end, don't keep
looping with append(char) no matter how nice the source code looks.
If you can use anything with [] in the sig it stands a chance
of being faster. Pass as much a priori info to the library classes
as you can- append means whoops I found ANOTHER thing to add when
you already have the data just say  here is the string I need.
And unless you actually need the tokens as strings, consider
just returning indexes or something. You may or may not
need strings all the time, it may be possible to use int[] or something. 
 
 

 Another thing that could make a difference is ordering of the case and if's
 - for example, the default: branch turns around and does a check for (ch ==
 '-' || ch == '+' || ch == '.' || (ch= '0'  ch = '9'). Changing this to
 be:

 case '-':
 case '+':
 case '.':
 case '0':
 ...
 case '9':

 
Actually optimizing compilers do stuff like this- you have some
known, assumed, or measured branching probability and make
the common ones faster ( minimize expectation value of execution time).
I haven't checked lately but IIRC the compiler tries to make
a switch into a jump table.
 

 May be better.


 The loops that check for while (ch != -1  ((ch= '0'  ch = '9') || ch
 == '.')) could also probably be optimized by removing the  ch != -1 check
 - the other conditions ensure that the loop will escape if ch==-1


 It might be interesting to break the top level parsing branches into
 separate functions so the profiler tell us which of these main branches is
 consuming the bulk of the run time.


 Those are the obvious low hanging fruit that I see.

 Final point: I've seen some comments suggesting inlining of some code.
 Modern VMs are quite good at doing this sort of inlining automatically - a
 test would be advisable before worrying about it too much. Having things
 split out actually makes it easier to use a profiler to determine where the
 bottleneck is.


 One thing that is quite clear here is that we need to have some sort of
 benchmark that we can use for evaluation - for example, if I had a good
 benchmark test, I would have just tried the ideas above to see how they
 fared.

If you have a set of benchmarks, you can afford to measure them
according to things you think will impact execution time. then, with
enough, you can fit execution time to your measurements using alt
pieces of code. This is where you start find statistics helpful
(  pdf with value X for attribute A incurs a time penalty of 
n seconds per increment of X ). FWIW, there is also something
about identical, indepdent entities for a statistical sample. If you can measure
these they aren't identical. 
 
 
 The top are now:

Re: [iText-questions] AW ESOME! performance follow up

2010-04-23 Thread Mike Marchywka




btw, why do these need to get changed to ints?
and, do you notice with task manager that
CPU is not at 100 pct? This often indicates
disk limit- either explicit IO or VM.
 
I've actually had c++ code that I thought
was computationally limited turn out to 
be IO limited. Often simple compression is
well worth the effort. Not just for explicit
disk transfer rates, but for saving memory
to keep more things in low level cache
or out of VM. 

And, again, more generally prefer things
with [] in sig, including string ops if
you even need strings instead of [].




 Date: Fri, 23 Apr 2010 13:50:49 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] AW ESOME! performance follow up


 Don't know if it'll make any difference, but the way you are reading the file
 is horribly inefficient. If the code you wrote is part of your test times,
 you might want to re-try, but using this instead (I'm just tossing this
 together - there might be type-os):

 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 byte[] buf = new byte[8092];
 int n;
 while ((n = is.read(buf))= 0) {
 baos.write(buf, 0, n);
 }
 return baos.toByteArray();


 If loading the file into main memory makes any difference, that difference
 will be a measure of the impact of virtual-native interface interaction.
 In effect, this is telling us whether the calls to file.read() should be
 replaced with file.read(byte[]).



From your results, are you seeing a big difference between iText and the
 competitor when you aren't flattening fields vs you are flattening fields?
 Your profiling results aren't indicating bottlenecks in that area of the
 code. If iText is much faster than the competitor in the non-flattening
 scenario, but slower than the competitor in the flattening scenario, I'm
 having a hard time reconciling the data presented so far.



 Giovanni Azua-2 wrote:


 I am sooo sorry the performance is worse with the change for pre-loading
 the PDFs in the test-case :(( the problem was that I ran the
 benchmarks with a small mistake in my test case ...

 Loading the HEADER demonstrates how to load flattened pre-formatted PDF
 part templates ...

 Loading the FOOTER demonstrates how to load PDF part templates containing
 fields that need to be populated.

 The mistake was to leave fixed the HEADER always ... so it would load only
 the flattened PDF template and not the footer (see below) [sigh] In any
 case is good to know that loading flattened PDF parts is cheaper.

 I mistakenly ran the last benchmark like this:

 private static byte[] file2ByteArray(String filePath) throws Exception {
 InputStream input = null;
 ByteArrayOutputStream output = null;
 try {
 input = new BufferedInputStream(new FileInputStream(HEADER_PATH));
 output = new ByteArrayOutputStream();
 int data = input.read();
 while (data != -1) {
 output.write(data);

 data = input.read();
 }

 return output.toByteArray();
 }
 finally {
 if (input != null) {
 input.close();
 }

 if (output != null) {
 output.close();
 }
 }
 }
 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/


 --
 View this message in context: 
 http://old.nabble.com/performance-follow-up-tp28322800p28346146.html
 Sent from the iText - General mailing list archive at Nabble.com.


 --
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/  
   
_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list:

Re: [iText-questions] performance follow up

2010-04-23 Thread Mike Marchywka

Date: Fri, 23 Apr 2010 11:53:09 -0700
From: forum_...@trumpetinc.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] performance follow up

Parsing PDF requires a lot of random access. It tends to be chunked - move
to a particular offset in the file, then parse as a stream (this is why
paging makes sense, and why memory mapping is effective until the file gets

Yes, that is great but instead of a generic MRU approach are
there better predictions you can make, even start loaing pages
before having to wait later etc? Maybe multithreading makes
sense here.

too big). But the parsing is incredibly complex. You can have nested
object structures, lots of alternative representations for the same type of
data, etc...

surely there are rules and I'm sure this topic has been beaten
to death in many CS courses ( as have stats LOL). Profiling
should point to some suspects. Algorithmic optimizations may
be possible as maybe just coding changes. Most compilers
operate sequentially on input in maybe multiple passes I'm
sure you can find ideas easily in a vraiety of sources.

And we definitely don't know size of any of these structures ahead of time.

well, you don;t need to know if a week ahead of time, but
you could maybe waste an access or two finding sizes if that
can be done more quickly than just reading everything.

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] performance follow up

2010-04-23 Thread Mike Marchywka

 Date: Fri, 23 Apr 2010 14:53:57 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance follow up

 I'd love to discuss specific ideas on prediction - are you familiar enough
 with the PDF spec to provide any suggestions?

No, I started to play with itext for some specific things
and then lost time/interest/need but I have a general interest
in the topic and may jump back in at some point. 

 Some obvious ones are the xref table - but iText reads that entirely into
 memory one time and holds onto it, so it seems unlikely that pre-fetch would
 do much there (other than having the last 1MB of the file be the first block
 pre-fetched - but any sort of paging implementation would handle that
 already).

 The rest... well, from my experience with this, you've got objects that
 refer to other objects that refer to other objects. And there's really no
 way to know where in the object graph you need to go until you parse and
 then go there. So I think I'll need some concrete examples of how this
 might be done with PDF structure - just to get my creativity going!

Well, in a case like that you may  want to try to reorder and glean
all the stuff you need from what you have in memory before following
all the references. Along the lines of find both delims in your array 
and use String(byte[],offset,length) instead of append(char) a zillion times, 
scan your current local objects for references they need and que those up
before chasing after each one. It may turn out that sorting these
and getting them in some order creates a net time savings- I wouldn't
have believed this myself until I actually sorted a huge dataset
prior to running a program and it turned it from impractical
to practical runtime due to increased access coherence.  Disk is
slower than the low level cache :)

_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] performance follow up

2010-04-22 Thread Mike Marchywka

Cool, analysis is always a plus and easier
to discuss than adjectives. Just a few
rather trivial comments.

Date: Thu, 22 Apr 2010 02:02:31 +0200
From:
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] performance follow up

Hello,

Good news ... after applying the attached patch to trunk and doing yet
another performance experiment using the previously posted workload these are
the results:

[...]
Is iText with the patch better than before?

This of course is where you consult Mark Twain. LOL.
iText is or isn't better than before ( for some particular
use case) irrespective of the data you currently have
but the question is does the data allow
you to reject the conclusion that they are have the same
execution times with some confidence level?

Finding ways to explain or attribute the noise into somekind
of model of course would be a reasonable thing to
consider if you had a few more test cases with some
relevant parameters( number of fonts you will need or something).
The statistics are just a guide to help you
infer something causal- in this case perhaps something
like, did the patch cause itext to get better? as you
suggested originally. If you can start describing where and how much it got
better,
response surfaces I guess, then of course you are starting
to develop strategy logic, and could take a given task and feed
it to the patched or non patched version ( among a new family
of altnerative implementations) depending on
the parameters you know about it- obviously for the cases
you have only one decision makes sense andd off hand based
on what you said about nature of patch I don't know of any
case where generating gratuitous garbage is a good strategy LOL.

The paired observation of the means are:

At this stage
it is usually helpful to look at the data, not
just start dumping it into equations you found in a book.
I'm not slamming you at all, just that its helpful
to have a check on your analysis even if you
are using something canned like Ror a commercial package,
more so if you just wrote the analysis stuff today.
Don't ignore things like histograms etc, after all my
criticisms of PDF for its ability to obscure information
with art, sometimes there are pictures worth a thousand words.
And of course using the pictures to suggest
various sanity checks you can write.

The Letter PDF looks good i.e. the patch didn't seem to break anything but
you will have to run the unit tests on it.

LOL, often people forget this step.

Also it sounds like the alt pacakage is still faster by
a clinically significant amount- an amount relevant to someone.
There may be more coding optimiztions or algorithmic optimizations.
for example, converting a string to a byte array could
have some benefits, hard to know off hand since that
may incur more java code then native code to manipulate
but something to consider in a more general case.
With a byte array you may be able
to avoid creating lots of temp string, just make an int
table of the locations of new tokens or pass around indexes
instead of temp token strings. etc etc

_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] performance follow up

2010-04-22 Thread Mike Marchywka











 From: brave...@gmail.com
 Date: Thu, 22 Apr 2010 23:22:36 +0200
 To: brave...@gmail.com
 CC: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance follow up



 Hello,

 On Apr 22, 2010, at 10:59 PM, Giovanni Azua wrote:
 PRTokeniser.isWhitespace is a simple boolean condition that just happen to be 
 called gazillion times e.g. 35'622'000 times for my test workload ... if 
 instead of doing it like:

 public static final boolean isWhitespace(int ch) {
 return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch == 32);
 }

does this have to be int vs char or byte? I think earlier I suggested
operating on byte[] instead of making a bunch of temp strings
but I don't know the context well enough to know if this makes sense.
Certainly demorgan can help but casts and calls are not free either.
 
Also, maybe hotspot runtime has gotten better but I have found in
the past that look up tables can quickly become competititve
with bit operators ( if your param is byte instead of int, a
256 entry table can tell you if the byte is a member of which classes). 
 

 we used a bitwise binary operator with the appropriate mask(s), there could 
 be some good performance gain ...

 The function already exists in 
 http://java.sun.com/javase/6/docs/api/java/lang/Character.html#isWhitespace%28char%29
  I checked and it already uses bitwise binary operators with the right masks 
 ... we would only need to inline it to avoid the function call costs.

 Best regards,
 Giovanni
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] performance follow up

2010-04-22 Thread Mike Marchywka

From:
Date: Thu, 22 Apr 2010 22:59:43 +0200
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] performance follow up

Hello Mike,

On Apr 22, 2010, at 12:22 PM, Mike Marchywka wrote:

Good, this is exactly what I meant :)

The performance comparison is based on the representative test case exactly
as business wants it.

Yes, I know, I'm simply pointing out this is part of a bigger issue
that may or may not be relevant to itext but in general is something
to consider for long tasks. For example, maybe see FFTW.

As far as I know we need only two fonts: light and bold. So the number of
fonts is not a parameter.

I made that up as a simple strawman. If you make a model for execution
time given some parameters that are important, you can pick
a specific implementation that you expect to be faster. Again a
bit of an extrapolation to make your stats analysis more worthwhile.

This book is the official reference for the course in Advance System
Performance Analysis I am taking for my graduate CS Master program in the
top-10 Technology University of the world ... so no, it is not just equations
I found in a book :)

LOL, you need to see Allen Greedspan interview, may have been on
CNBC or 60 minutes talking about financial models essentially
saying he didn't understand them but bright PhD's were doing it
so it must be right. The punchline is appeal to authority or credentials
when a factual argument is more ala point. This in fact is how
Mark Twain gets to the front so quickly. Again, I'm not suggesting
you did anything wrong or bad, I haven't actually checked numbers
or given the specific test a lot of thought- 9 data points is usually
not all that conclusive in any case and I guess that's my point.

Presumably you could keep measuring each case with and without patch
and slowly ( sqrt N) get better estimate of average execution times.
Then, end up with 9 data points that are difference in execution time
for each case with/without patch and asymptoticallty measure arbitrarily small
differences. However, it may be more helpful to look
at pictures like histograms or at least run various assumption checks.
You may have non-normal distros and those shapes can tell
you something about causes, not always but it helps to look
before taking one result and running with it.

Now only 23.8% to go. We only need to make 4 more fixes like the last one and
the gap will be gone :) The Profiler shows there are still

I wouldn't count on the other one being the final solution to
pdf creation. Do you have symbols info or can you profile it at
all?

_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] performance question

2010-04-21 Thread Mike Marchywka










 Date: Wed, 21 Apr 2010 11:56:00 +0200
 From: giovanni.a...@credit-suisse.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] performance question














 Hello Mike,



 I appreciate your interest and will to help. Thank you.



 Mike Marchywka [mailto:marchy...@hotmail.com] wrote:


Forgive the top posting but since this involves statistical modelling


with OP from a financial services company and the comments have


already started flying, allow me to throw my two sticks onto the


fire.





 I must say that I don't have a strong quantitative background quite yet ... 
 but I am working on that :o)


I think I exercized great restraint not going off about PDF
being ideal for a CDO prospsectus or some other innane comments
about modelling in financial companies :) It isn't you,
its just the material is inherently funny at this point...
 
 


 The fact that I post from this email address only means that I have no access 
 to my private emails from the office so I have no choice.

 
Someone from Lehman posted on another list but now I digress...
 






 It may help us here formulate a constructive response


 if you could dig a bit deeper into the close method and see who


 the big resource hog is. Also, if you can point to the speed limiting


 step in your alt package it may be interesting to contemplate. We


 really don't know too much about details of your typical use case,


 ... snip ...


 sometimes IO dominates instead of computation just because no


 ... snip ...





 Indeed there is quite some IO involved but this is also why I am benchmarking 
 maybe one alternative does more IO than the other for the same use-case or 
 maybe my implementation of the use-case is not optimal which would also be a 
 valid outcome of the experiment :o)



 My assumption is that running the same workload under the same conditions (my 
 development environment) should show if there is a significative performance 
 difference between the two alternatives i.e. compare two means.



 I include the full code for the use-case and workload below.



 Also, since you went to all the trouble of doing a stats analysis,


 and since these things are supposed to be deterministic,


 it may help to get some idea how the execution time noise appears if


 indeed it is a significant fraction of the average. Presumably this


 is things like OS, other tasks on machine if you measured wall clock


 (not cpu time devoted to you) and GC and other stuff including


 maybe disk and memory cache states. I would point out that depending


 on exactly what you are measuring, you could be seeing lots of


 caching hot/cold issues that could dominate the results.





 I am aware of this, however, I would not seek a lot of isolation because 
 would be like creating a synthetic unrealistic environment for running the 
 benchmarks e.g. if iText did in fact used more memory than the alternative I 
 would not like to hide from the benchmark the consequencing higher GC 
 activity.


Again, it depends on what you are measuring. If you just want to
tell management approach X will require time T with distribution
blah ( normal+SD, or whatever you actually find with params to 
describe it ) on server foo with a gazillion bytes or RAM etc then
that's fine. If you want someone here to figure out why itext
is slower, the pointing to a hogging method would help. 
 
 
 
 
 


 A few notes about the micro-benchmarking I did:



 - I do warm up by running the use-case 1K times

Even here this is ambiguous, if I did this in a simple case
I'd do it from a bash script and the JVM startup time could be
significant and variable. Even running a java program once and putting
a 1k loop inside may or may not warm up caches but it is probably
not realistic for your environment but would be good enough to point
to bottle necks if you use the profiling tools ( see sun.com for
jhat or profiling ). 
 
 


 - I then benchmark 1K times the elapsed time as shown in the method below 
 performanceBenchmark


 - I do these two points above multiple times


 - The exact same thing is done for the alternative that generates (almost 
 exactly) the same PDF

This can be an important issue- the final result that people care
about is usually just a bunch of pixels. If you can substitute cheaper
things that look ok, that could be a big deal. 
 


 - The dynamic allocation of Map of data and similar is emulating what will 
 happen in the real implementation and this is done in the exact same way for 
 the alternative.



 - I use the output 1K elapsed times for both alternatives to do the paired 
 t-test following the recipee [1, 13.4 Comparing two alternatives] which 
 outputs that iText is lagging behind [22.53, 24.18] milliseconds with 95% 
 confidence meaning that one framework perform faster than the other and that 
 the difference of the means is significative and not merely noise.

If you really

Re: [iText-questions] performance question

2010-04-20 Thread Mike Marchywka


Forgive the top posting but since this involves statistical modelling
with OP from a financial services company and the comments have
already started flying, allow me to throw my two sticks onto the
fire. 
 
It may help us here formulate a constructive response 
if you could dig a bit deeper into the close
method and see who the big resource hog is. Also, if you can
point to the speed limiting step in your alt package it may
be interesting to contemplate. We really don't know too much
about details of your typical use case, we won't forward your
comments to IRS or SEC ( LOL). Often you do find that mundane
things get taken for granted- I;ve found sometimes IO dominates
instead of computation just because no one looked at the code
and data gets copied many times and is moved byte by byte etc.
 
Also, since you went to all the trouble of doing a stats
analysis, and since these things are supposed to be deterministic,
it may help to get some idea how the execution time
noise appears if indeed it is a significant fraction of the average. Presumably 
this is things like OS, other tasks on machine if you
measured wall clock ( not cpu time devoted to you )  and GC and
other stuff including maybe disk and memory cache states. 
I would point out that depending on exactly what you are measuring,
you could be seeing lots of caching hot/cold issues that could dominate
the results. 







 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Tue, 20 Apr 2010 06:07:13 -0700
 Subject: Re: [iText-questions] performance question




















 And following up on point 3 – you have the source code, feel
 free to modify it for your personal needs.







 Leonard











 From: Paulo Soares
 [mailto:psoa...@glintt.com]

 Sent: Tuesday, April 20, 2010 8:48 AM

 To: Post all your questions about iText here

 Subject: Re: [iText-questions] performance question













 Hmm, there are lies, damn lies and statistics. While I don't
 dispute the 30% let's see the probable causes for this:















 -
 iText tries to do things correctly avoiding to cut corners that will come and
 bite your later. Metadata writing, appearance generation and so on.















 -
 iText is a generic PDF library. It reads, writes and modifies PDFs. Any 
 library
 designed with a narrower purpose can optimize the interested areas to
 perform better.















 -
 iText comes with source and can be extended, modified, altered. This implies
 that a sensible and probably heavier structure must be in place to allow
 that. If you have a closed source library with just a single purpose things 
 can
 be done faster as that's all it's going to do.















 -
 com.itextpdf.text.pdf.PdfStamperImpl.close()
 is where everything is written to file, if you avoid calling this nothing will
 come out.















 -
 There are some speed and memory improvements in the pipeline but I don't
 know how much % improvement will result or in what areas.















 Paulo
















 





 From: Azùa Giovanni (KSXD 32) [giovanni.a...@credit-suisse.com]

 Sent: Tuesday, April 20, 2010 1:12 PM

 To: Post all your questions about iText here

 Subject: [iText-questions] performance question







 Hello,



 For a
 specific Letter generation use-case I prepared a test of statistical
 significance using a paired t-test for comparing the performance [1] of iText
 vs a commercial PDF framework. The experiment shows that for our relevant
 use-case iText underperforms by 30% with 95% confidence.



 I did some
 further investigation of the iText code for this specific use-case and found
 the following call to be among the top most expensive calls:



 com.itextpdf.text.pdf.PdfStamperImpl.close
 (line 189) taking up to 195 milliseconds



 The code
 that invokes such method is the following:



 private
 static
 void
 appendFooter(PdfWriter writer) throws Exception {



 Map String replacements = new HashMap();




 replacements.put(ph0001, X
 X);



 replacements.put(ph0002, Head of
 Customer Acquisition);



 replacements.put(ph0003, XX
 XXX);


 replacements.put(ph0004, Head of Customer
 Satisfaction);





 PdfReader footerReader = new PdfReader(FOOTER_PATH);


 ByteArrayOutputStream outputStream = new
 ByteArrayOutputStream();


 PdfStamper stamper = new PdfStamper(footerReader, outputStream);



 AcroFields form = stamper.getAcroFields();


 for
 (Map.Entry entry : replacements.entrySet()) {



 form.setField(entry.getKey(),
 entry.getValue());


 }


 stamper.setFormFlattening(true);


 stamper.close();





 int pageOne = 1;



 int xOffset = 5;



 int yOffset = -560;



 PdfReader memoryReader = new PdfReader(outputStream.toByteArray());



 PdfImportedPage importedPage = writer.getImportedPage(memoryReader, pageOne);



 PdfContentByte content = writer.getDirectContent();


 content.addTemplate(importedPage, xOffset,

Re: [iText-questions] Null pointer exception with PdfStamper.close()

2010-04-10 Thread Mike Marchywka

Date: Sat, 10 Apr 2010 14:35:40 +0200
From: i...@1t3xt.info
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Null pointer exception with PdfStamper.close()

Mike Marchywka wrote:
I would mention the source code is available and compiling with debug info
may get you a quicker
ansewr sometimes. With something called fill and properties there is
probably something
informative, like keys in a hashtable, that will tell you what you forgot (
for example, it used
a key importantInfo to get a result from a hashtable and assumed it would
be non null but
you just forgot the code that inserts importantInfo.

Yes, that's correct.

I've looked at the fillOCProperties() method, and there are plenty of
places where a value is retrieved from a PDF dictionary using some key,
but in all cases, there's a check if (value != null) before something
important happens, and I need a standalone example + PDF that causes
the problem to find out what goes wrong.

I'm writing mobile phone apps and don't usually have stack of line information.
When I get an NPE like this and have situation you describe it usually turns
out I have
a null table. There may not be too many other options although I guess depending
on the parameters you could be passing a null but this could throw illegal arg
depending
on how used etc. Note sure now if HAshtable.get(null) throws IllegalArg or not.

--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] iText compression modes

2010-03-31 Thread Mike Marchywka

 Date: Wed, 31 Mar 2010 03:51:35 -0700
 From: 
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] iText compression modes

 Hello Bruno

 We are need to be able to compress the PDF files as much as we can to store
 our clients' PDF files.

Apparently you are responding to a message that discusses compression
and the need to add code to itext to do that as part of the creation
process. This would presumably be the best non-lossy compression as
it knows the data and isn't just empirically trying to find statistical
patterns- for example, if you tried to use winzip on a pdf file. 
Contrary to what many managers may think, the PDF
creation process doesn't add information ( gee, we just multiplied the
file size by 10, look at all that new information and knowledge we generated)
and it may in fact be that the best compression is simply to save all the
input data although others have pointed out that this input data
may be quite covert and difficult to find if you want pixel level 
reproducibility.
This input data would in fact be something like a decomposition of your PDF
if you had a way to do that as is done with audio in something like ACELP
but this would be lossless- ACELP tries to fit an arbitrary waveform to a 
limited
input model and doesn't always work, you already have the input data. 
Image compression of course is often lossy, and you may not have a losssless
restriction. 

If you really do want minimum file size and don't care about speed or
other attributes and can tolerate lossy compression, there are a lot of
options. You could even try some lossless data compression things like bzip2 I 
guess
but again it is better and faster if you( the compression algorithms) know
what is in the data rather than having to guess or discover it.

 It is VITAL for us to do that and we would like to keep us using iText.

 I've compressed some jPEG (10 files aprox. 3.3Mb in total) files using
 ImageMagic
 [convert *.jpg -alpha off -monochrome -compress Zip -quality 100 -units
 PixelsPerInch -density 600 image_deflate.pdf]
 and the generated PDF was 700kB while using iText was 3.6Mb.

 As the storage is a BIG concern for us we are trying to find a solution.

 I personally dug into your book for compression tips but even with
 PdfStamper  setFullCompression the file was still 3.3Mb.

 ANY help or suggestion will be much appreciated.

 Bruno Lowagie (iText) wrote:

 Bruno Lowagie wrote:
 How much does that matter for your customer?

 I've just checked. Introducing the concept of compression
 level would involve changing about 20 classes.

 It would be possible to set the compression level:
 - on the writer level (mostly page content streams)
 - some Image streams
 - font streams
 - embedded file streams

 I'll see if I have the time to do this.
 While I'm at it, I could also look at the encryption
 of embedded file stream.

 Is there anybody I can invoice for this work?
 Tobias' customer? Tony's customer?
 br,
 Bruno

_
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850552/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] iText compression modes

2010-03-31 Thread Mike Marchywka

Date: Wed, 31 Mar 2010 13:28:31 +0200
From: i...@1t3xt.info
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] iText compression modes

scriptoid wrote:
Hello Bruno

We are need to be able to compress the PDF files as much as we can to store
our clients' PDF files.

It is VITAL for us to do that and we would like to keep us using iText.

I've compressed some jPEG (10 files aprox. 3.3Mb in total) files using
ImageMagic
[convert *.jpg -alpha off -monochrome -compress Zip -quality 100 -units
PixelsPerInch -density 600 image_deflate.pdf]
and the generated PDF was 700kB while using iText was 3.6Mb.

You're mixing different concepts. When you set the compression for an
image in iText, you are talking about LOSSLESS compression. When you set

I guess I'd also mention that lossless to you means preservation of pixels
and maybe some document
structure ( LOL, although people seem to not want to put this in anyway) , not
arbitrary
stuff like the order of dictionary entries or something (I'm making
stuff up since I don't know PDF innards well enough but others
have pointed out that something can be permuted or moved without
effect on pixels coming out ). Winzip of course wouldn't know that,
but the PDF compression could maybe benefit from uninformative
ordering and allocate no bits for it in the compressed format. Again, however,
all of this is generated from a set of input data that is probably the most
concise representation of your PDF file you will get. Your images are unlikely
to compress better once they are mangled into a PDF file but an image
compression
algorigthm for your source images and text compression for your text
and font compression for your ( non-redundant) fonts would be a better way
to go. Of course, decompressing all of this ( regenerating the PDF again ) could
take a lot of time.

the compression for an image in ImageMagic, you're talking about LOSSY
compression.
In iText, the number of pixes (the resolution) isn't changed. I'm sure
that ImageMagic reduces the resolution.

I personally dug into your book for compression tips but even with
PdfStamper setFullCompression the file was still 3.3Mb.

Read section 10.2.6 of the second edition: Lossless compression won't
result in dramatic file size reduction. However, if lossy compression is
acceptable, you could use the java.awt.Image to reduce the quality.
Listing 10.12 shows an example named CompressAwt that explains how to
reduce the resoltution.

If you have hints to phrase this in a better way so that people don't
have the same question you have, feel free to post your suggestions here.
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] pdf graphics file questions

2010-03-26 Thread Mike Marchywka










 Date: Fri, 26 Mar 2010 10:28:04 -0700
 To: itext-questions@lists.sourceforge.net
 From: bri...@ananya.com
 Subject: Re: [iText-questions] pdf graphics file questions

 Hi,

 (I sent this post on 3/22, but it seems it never got out.)

 Thanks! I hope I will find the information about AICB.

 Well, I am still a total beginner. So what are the content
 parser classes?

 I know how to write a graphics PDF file, but I would like to
 have detailed instructions how to read a graphics PDF
 file and translate it into Java code.


You guys need a list of frequently used links containing a link to
one or more open source renderers. I downloaded one of these
and found it quite helpful for dumping things. 

Also, I'm not sure what exactly you are trying to do but for testing
something like this you could probably find better test
vehicles. After integration you may not notice much difference in total 
performance
on many PDF files as much of it seems to be parsing and Stuff other than 
filling in pixels. 
but I guess if you want to verify equivalence to some other
thing it is good to be able to compare pixel for pixel results. 




 Thanks for everything!


 At 11:14 AM 03/22/10, you wrote:
You can certainly look into it - it appears that other products
support it, so it may be published...

Yes, you can use the new content parser classes to find all the
vector drawing commands in the PDF - but they are just the series of
commands that you will need to turn into something for your needs.

-Original Message-
From: Brigit Ananya [mailto:bri...@ananya.com]
Sent: Monday, March 22, 2010 1:13 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] pdf graphics file questions

Hi,

Thanks a lot, Mr. Rosenthol! So, do you mean that the
AICB is not private to Adobe, that I could learn it?

I will look at the Adobe Illustrator SDK.

So, besides trying to learn AICB, my only remaining
question is:
With iText, Is it possible to read the array of
CubicCurve2D.Doubles and the stroke and fill informations
from a pdf graphics file of curves?
Well, this is probably a question for someone else.

Thanks in advance for responding.


 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_3
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Specifying HTTPClient to access URL in iText

2010-03-25 Thread Mike Marchywka

Date: Wed, 24 Mar 2010 15:01:57 -0700
From: rthanga...@ebay.com
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] Specifying HTTPClient to access URL in iText

Hi,

I am using iText2.0.7 jar to create a PDF document.
I am passing the imageurl to the Image.getInstance() method to get the
Image. But i am getting the below exception

java.net.ConnectException: Connection timed out: connect

I understand this is due to the firewall setting on the image server side.

Does iText provides a way to specify the Image URL as well as our own
HTTPClient. So that i can specify proxy to establish the connection.

You want to pass it a connection of somekind?
I think the alt I've seen is to use the byte[] signature, it is very
difficult to make an API that takes a URL and gives you
complete flexibility when all you want is some unrelated widget.
I guess something that takes a connection would not be unreasonable
but at that point you may as well extract the data yourself
and pass that into the method which has been discussed here before.

Thanks in advance.

--
View this message in context:
http://old.nabble.com/Specifying-HTTPClient-to-access-URL-in-iText-tp28019932p28019932.html
Sent from the iText - General mailing list archive at Nabble.com.

_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_3
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Querry regarding iText jars: What is the Maxsize a PDF can be generated ?

2010-03-24 Thread Mike Marchywka










 From: psoa...@glintt.com
 To: itext-questions@lists.sourceforge.net
 Date: Wed, 24 Mar 2010 16:43:27 +
 Subject: Re: [iText-questions] Querry regarding iText jars: What is the 
 Maxsize a PDF can be generated ?












 Not really deliberated. iText started 10 years ago
 when a gigabyte was a lot bigger than it is now. 2G looked enough then and now
 nobody has the time to change the code to support it. I also doubt that 
 there's
 a need for it safe for a couple of people (Gylfi included).

IS this the large file issue that normally strikes at about 4GB when you have 
unsigned 32 bits ints?

I thought most itext API's used streams and therefore had no real limits 
related to files.
Any user could presumably subclass a file or stream as long as nothing inside 
itext
uses random access with 32 bits indicies.










 Paulo


 - Original Message -

 From:
 Gylfi Ingvason

 To: 'Post all your questions
 about iText here'

 Sent: Wednesday, March 24, 2010 4:27
 PM

 Subject: Re: [iText-questions] Querry
 regarding iText jars: What is the Maxsize a PDF can be generated ?



 Don't know about the Java version of iText, but last time I
 checked, iTextSharp did not support generating PDF files greater than 2 GB and
 my impression from Paulo was that this was deliberate and that adding that
 support was not being planned.



 

 From: Leonard Rosenthol
 [mailto:lrose...@adobe.com]
 Sent: Wednesday, March 24, 2010 12:03
 PM
 To: itext-questions@lists.sourceforge.net
 Subject:
 Re: [iText-questions] Querry regarding iText jars: What is the Max size a PDF
 can be generated ?





 PDF
 supports files that are hundreds/thousands of pedabytes(!) in
 size. iText, however, may be limited to the original 10 gigabyte
 limitation.






 From: G Chalpati Rao
 [mailto:gchalpati...@yahoo.com]
 Sent: Wednesday, March 24, 2010
 9:28 AM
 To: itext-questions@lists.sourceforge.net
 Subject:
 [iText-questions] Querry regarding iText jars: What is the Max size a PDF can
 be generated ?










 Hi ,







 I have a querry regarding the usage of iText
 jars.



 What is the maximum size in a file, that it can
 be converted to the PDF format.







 Please respond as soon as
 possible.



 What are the API's we will use to convert a big
 file to PDF format ?



 I Need a file size of more than 400
 mb.







 Thanks  Regards



 G.C.Rao






 



 Looking for the
 perfect gift? Give the gift of
 Flickr!
  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850553/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Perfomance Question - ByteArray vs Files

2010-03-21 Thread Mike Marchywka

 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Sat, 20 Mar 2010 19:50:02 -0700
 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files

 I would, of course, argue with #3 as would the governments of every country 
 on earth, every major enterprise around the world, etc.

But you let #2 go by without comment? LOL. This is becoming a debate on 
religion. IF
you are going to defend PDF in a technical forum by appeal to popularity among 
large organizations without a primary  focus on technology, well, I won't feel 
too bad about anything I post :)
You have discussed the voting machine, now what about the weighting machine? 
LOL.

 -Original Message-
 From: warren [mailto:warrenonsourcefo...@charter.net]
 Sent: Saturday, March 20, 2010 2:34 AM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files

 Ok. So the answers are

 1) To understand how PDFReader works, I need to dig into the source files
 and attempt to learn the underpinnings of the iText code and JAVA. I hadn't
 planned on this since I am implementing iText from another language like a
 black box. I've only played with JAVA directly on a limited basis. Guess
 I'll have to dive in.

 2) Disk I/O is Bad

 3) PDFs are Bad. Not sure I have much choice since PDF is what the
 customer wants.

 4) Empirical testing is the answer. Code both methods and test various
 conditions. If results are bad, dig deeper. I have limited access to the
 server but I'll see what tools are available.

 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850552/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] How to test the pdf output?

2010-03-21 Thread Mike Marchywka

 Date: Sun, 21 Mar 2010 19:21:21 +0800
 From: 
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] How to test the pdf output?

 Hi,

 What's the *best* way of regression testing my Java code that
 generates pdf documents? Comparing the pdf file byte to byte to
 reference documents? How do you do regression testing for iText?

Id been advocating use of an open source renderer and doing
pixel compares but I'm not actually active in the field. 
Presumably you can also instrument
the renderer to dump various lists or hashes and compare these.
I guess you need to think of this somewhat like regression
on floating point algorithms that close is good enough
or you will get lots of spurious errors. If you really
want to get fancy, and I'm just speculating here, 
you may be able to find image analysis or compression programs
which can try to compression your difference image between
ref and test pages and diagnose or attribute the differences
to some types of features- give you some indication if they are 
perceptually important or not without having to open the image itself.
Lossy compression usually tries to only capture the stuff a viewer is likely
to care about. 

 Thanks

 Fred

 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850553/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Perfomance Question - ByteArray vs Files

2010-03-21 Thread Mike Marchywka










 From
 To: itext-questions@lists.sourceforge.net
 Date: Sun, 21 Mar 2010 14:26:38 -0500
 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files

 This is not about general coding philosophy or the merits of PDFs. I am in
 an iText forum specifically because I need to produce a PDF. I always
 evaluate what the customers needs and wants are to produce the proper
 output. If I didn't want a PDF, I wouldn't be here. End of story.

That doesn't mean you need to produce a PDF de novo everytime someone
from google finds your site, we are talking about optimizations given
constraints. PDF is expensive compared to alt ways to preview for example.



 In my opinion, one has to balance coding with resources. I've been on
 servers where developers don't do this, resulting in server crashes and poor
 performance. I don't like it so I always try to understand what is going on
 under the covers to figure out where basic intelligent tradeoffs can be
 made. Yes, there are different techniques that one can use to achieve this
 balance and we can have endless discussions about that. But not now.

 I am nowhere near ready to get into advanced techniques in Java or looking
 for bottlenecks buried deep in the server, especially since I'm on a closed
 server with very limited access. Not being a Java programmer and being new
 to iText, I'm at a big disadvantage. I was hoping I could treat iText more
 like a black box and, with a basic understanding, use it efficiently.

 I was trying to ask a specific technical question targeted at memory
 utilization of PDFReader and PDFStamper. Seemed pretty obvious to me that
 since I was making two passes there was a tradeoff here. Somewhere we went
 off the tracks.

 I am very disappointed there isn't technical expertise available on this on
 forum that can give an overview of the process and answer the question.


If you don't have the expertise yourself to evaluate the strategies we have 
outlined,
how do you even state with confidence 

[...]I've been on
 servers where developers don't do this, resulting in server crashes and poor
 performance. 

maybe they are doing the best they can with the constraints available
and the real problem is you just need to buy a bigger server?

Nobody here knows anything about the statistics or parameters of your system
or data. If you want some general ideas to discuss with your own experts, 
I hope we could help. Otherwise, you will have to hope someone who has looked
at the relevant code can just give you a simple answer.

I don't even remember your specific question but what exactly would
you do with the answer? The other thing is that people like you often
ask questions that don't address their likely real concerns. Talking
around a little, time permitting, may help solve an underlying
problem or answer a question someone else has while browsing this in the 
archives.

Sometimes people do code kluges for the sake of pushing things out the door.
Maybe you have spotted something that is in fact a coding placeholder
but again it would be easier for someone to just look at the source
code or a heap dump than asking such a question. If I still had the code
somewhere I'd be tempted to look but still not sure what you would do
with an answer. Do you just want someone to read the code to you?
Perhaps you could try asking the original question again.


Not that it would help, but I would point out that multiple passes
through a dataset are generally bad due to loss of memory locality.
This invites thrashing- usually just lower level memory caches but
it can make your disk light stick on due to VM thrashing. The strategy
is to try to do block oriented operations in sizes so that you only use things
in lower level caches. 









  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850553/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Perfomance Question - ByteArray vs Files

2010-03-20 Thread Mike Marchywka

From: warrenonsourcefo...@charter.net
To: itext-questions@lists.sourceforge.net
Date: Fri, 19 Mar 2010 20:34:02 -0500
Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files

Ok. So the answers are

1) To understand how PDFReader works, I need to dig into the source files
and attempt to learn the underpinnings of the iText code and JAVA. I hadn't
planned on this since I am implementing iText from another language like a
black box. I've only played with JAVA directly on a limited basis. Guess
I'll have to dive in.

Well that was my answer but no one who knows the code has volunteered more
details.
IF you post a heap dump and ask why
certain things are being made, someone may or may not know.

2) Disk I/O is Bad

doing anything is bad, as you point out allocating and holding is bad.
VM just seems to be an unappreciated bottlebeck ( but its all in memory and
you can always just go buy more memory, that is cheap today LOL).

3) PDFs are Bad. Not sure I have much choice since PDF is what the
customer wants.

Personally they tend to be used where other formats can do and
they are used in such a way that the files increase in size and decrease in
information
content. Certainly if you have text and can just dump that to the browser, that
is faster ans uses less memory than adding artwork. It also may just be a matter
of obviousness too- if you were generating html, you may be able to easily
identify
constant blocks of html you could cache instead of regenerating each time
and in PDF if you don't know the details it may be less obvious.

4) Empirical testing is the answer. Code both methods and test various
conditions. If results are bad, dig deeper. I have limited access to the
server but I'll see what tools are available.

Well it is easy to get confused and the more direct tests are more direct.
If you just measure the time intervals routinely, System.currentTimeMillis IIRC
in java
you can get some idea where the problems may be., Memory allocations you
should be able to examine on your desktop easily. I think my suggestion was that
for expensive efforts spending more time for strategy selection or doing
things like sorting the data may produce a net benefit, again it all depends.

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Perfomance Question - ByteArray vs Files

2010-03-19 Thread Mike Marchywka

From:
To: itext-questions@lists.sourceforge.net
Date: Fri, 19 Mar 2010 16:16:14 -0500
Subject: [iText-questions] Perfomance Question - ByteArray vs Files

I'm creating a PDF in two passes with my goal to
end up with it as a file on the server. The first pass creates the PDF and
the second adds things like headers, footers, etc. using PDFStamper. The
PDF is being generated from a database so there is a possibility that it could
get to be large (a few hundred pages?).

This really has nothing to do with itext but some people have discussed
performance
issues and indeed the inner itext implementations may want to vary depending
on what the user can say apriori about some sizes etc. ( for large
tasks, spending some time up front picking a strategy or specific implementation
can pay off). And, of course, I'm a perennial complainer about the resources
related to the PDF file versus alternatives.

First, it may really help if you profile whatever you have- if there is
anything slower
than something called PDF, a highly loaded DB could be it. Do you keep
requesting the same (static) data from it? etc etc.

Of course, trying to do everything in memory sounds faster until you
find out that your memory is virtual and you keep thrashing. If
you want to rely on the OS great but if you think you can do better
you may benefit from reading/writing to disk the stuff you want
instead of making a huge heap and letting the VM system deal with it.
Once you are all in physical memory, then you want to try
to keep locality and stay in a lower level memory cache ( hard with java).
On some large data sets in other settings, I have used a sort ( yes, another
slow thing)
to stop memory thrashing and speed improvement was order of magnitude (from
essentially unusable to quite tolerable).

So, I guess the most authoritative answer is, it depends.

Right now I have the PDFWriter directing the output
to a FileOutputStream. Once that is done, the PDFReader picks it up,
connects to and uses in PDFStamper to process and send the PDF to the server
using another FileOutputStream.

It occurred to me that I might be doing this
wrong. If PDFReader brings the whole PDF into memory, wouldn't it be
better to have PDFWriter put the PDF out as a ByteArrayOutputStream which (I
think) PDFReader can pick up? Or does PDFReader only bring it in as it
needs it? Or is there some other issue I'm missing

I'm not real clear on what the tradeoffs are
between running everything out files and accepting the I/0 or keeping
everything in memory.

Can anyone give me some guidance?

Thanks!

Warren

Re: [iText-questions] Perfomance Question - ByteArray vs Files

2010-03-19 Thread Mike Marchywka

 From: warrenonsourcefo...@charter.net
 To: itext-questions@lists.sourceforge.net
 Date: Fri, 19 Mar 2010 17:34:37 -0500
 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files

 I'm asking because I've been on servers where applications did everything
 in memory, causing both the dreaded JVM out of memory error and the server
 to bog down. I'm not as concerned about marginal performance benefits as I
 am about being a good server citizen and keeping my resource consumption
 down. I'm not a java guru and am unfamiliar with what goes on under the
 covers.

The first law of computer science is once the disk light comes on its
time for a coffee break. If you want to learn how to keep the disk light
off, go to sun.com and look at things like profiling tools and jhat iirc.
Again, OT for itext but a common concern. 

 I quite agree that one needs to look at a the system, not just a piece of
 it. Yes, I am optimizing out our ORACLE database for speed, using stored
 procedures, indexing, etc.

 So

 Is my assumption wrong about the PDFReader taking the whole thing into
 memory? If it is taking the whole thing in, then I may as well create the
 PDF in memory in the first place and hook both passes together (assuming
 this is possible).

AFAIK the source code is still open, that is usually the best way to get a 
helpful understanding and often the original authors have forgotten details.
If you dump your heap some things may jump out at you. This could also be a
reminder that dumping things to disk and the clever use of compression
or concise rerepesenations could help even if conversions are frequent or 
somewhat
expensive. 

 If PDFReader doesn't do that, then I'm leaning more towards the File side of
 things so that if I get a large output from the DB I won't bog down the
 server. I expect that most PDFs will be a few pages but my users have been
 known to make strange requests.

As always, I would suggest reviewing the need to create the PDF in the first 
place
if html or raw text would do. And, you can look at some parameters,
like the number of things you get back from DB, and pick an implementation.

But again you need some empirical measures of wall clock time-
simply printing the java millisecond time diffs may give you a good
idea what is the bottleneck. If you have multithreaded code there could
be all kinds of rsource contentions or cpu spinning etc, again it could be 
anything that limits
your performance. If the user requests are highly redundant, you may benefit
from caching fonts or intermediate results like headers  et etc

 - Original Message -
 From: Mike Marchywka 
 To: 
 Sent: Friday, March 19, 2010 4:47 PM
 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files

 From:
 To: itext-questions@lists.sourceforge.net
 Date: Fri, 19 Mar 2010 16:16:14 -0500
 Subject: [iText-questions] Perfomance Question - ByteArray vs Files

 I'm creating a PDF in two passes with my goal to
 end up with it as a file on the server. The first pass creates the PDF
 and
 the second adds things like headers, footers, etc. using PDFStamper. The
 PDF is being generated from a database so there is a possibility that it
 could
 get to be large (a few hundred pages?).

 This really has nothing to do with itext but some people have discussed
 performance
 issues and indeed the inner itext implementations may want to vary
 depending
 on what the user can say apriori about some sizes etc. ( for large
 tasks, spending some time up front picking a strategy or specific
 implementation
 can pay off). And, of course, I'm a perennial complainer about the
 resources
 related to the PDF file versus alternatives.

 First, it may really help if you profile whatever you have- if there is
 anything slower
 than something called PDF, a highly loaded DB could be it. Do you keep
 requesting the same (static) data from it? etc etc.

 Of course, trying to do everything in memory sounds faster until you
 find out that your memory is virtual and you keep thrashing. If
 you want to rely on the OS great but if you think you can do better
 you may benefit from reading/writing to disk the stuff you want
 instead of making a huge heap and letting the VM system deal with it.
 Once you are all in physical memory, then you want to try
 to keep locality and stay in a lower level memory cache ( hard with java).
 On some large data sets in other settings, I have used a sort ( yes,
 another slow thing)
 to stop memory thrashing and speed improvement was order of magnitude
 (from
 essentially unusable to quite tolerable).

 So, I guess the most authoritative answer is, it depends.

 Right now I have the PDFWriter directing the output
 to a FileOutputStream. Once that is done, the PDFReader picks it up,
 connects to and uses in PDFStamper to process and send the PDF to the
 server
 using another FileOutputStream

Re: [iText-questions] (no subject)

2010-03-18 Thread Mike Marchywka










 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Thu, 18 Mar 2010 13:16:42 -0700
 Subject: Re: [iText-questions] (no subject)

















 PDF doesn’t support a “table structure” – you will need to apply
 advanced heuristics to figure out what is (or isn’t) a table and what is it’s 
 “header”,
 “columsn”, etc.

LOL, I've found that swearing and mashing the keyboard help too.
I would suggest your reiterate your comments to me about asking authors to 
retain
logical structure if they want to turn information into a work of art.

btw, whoever suggested that webkit based html to pdf converter saved me a lot
of work- I was able to drop that into an immediate problem ( I will eventually
remove the pdf component, I just needed a way to get a list of
all the resources needed by a web page and I had dug into webkit but
didn't have an easy to use front end). Thanks.












 Leonard











 From: Ahmad Amin
 [mailto:ahmad_a...@siliconexpert.com]

 Sent: Thursday, March 18, 2010 5:17 PM

 To: itext-questions@lists.sourceforge.net

 Subject: [iText-questions] (no subject)















 Hi



 I'm try to extract PDF Text content automatically,



 The problem is when I encounter Text in different
 table structure, I



 Couldn't differentiate between headers and columns
 values,



 I'm using Eclipse as JAVA2 IDE and most popular PDF
 Lib. (JPedal, iText, PDFOne



 Java, PDFBox) all these Libraries extract Text as
 fine but doesn't Give me capabilities



 To Detect PDF Table in table format (headers and
 columns).







 So I will appreciate any help from your side







 thanks










  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850553/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] adobe error

2010-03-15 Thread Mike Marchywka











 Date: Mon, 15 Mar 2010 08:36:16 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] adobe error

 Stephen gallaghan wrote:
 Hi all
 I am producing PDF documents using java but am gettin some errors with
 adobe acrobat reader 8 but not 9
 does anybody know of some software that will report what the error is?

 No, but I have a TV that doesn't work anymore,
 do you know what could be wrong?

 No, you can't because I'm not saying what is broke.
 I could have forgotten to plug the TV (and without electricity
 the TV doesn't work). Or the TV could be working, and the problem
 could be a broken remote control.

 So please don't post question saying am gettin some errors,
 be more specific and tell us which errors ur gettin.


I think he is asking for a diagnostic tool, not an answer, and presumably
the tool would work with a broad range of problems,

 does anybody know of some software that will report what the error is?

this comes up from time to time, I usually suggest the open source renderer.
You will need to instrument it yourself but it is reasonably easy to follow.





 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850553/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Can IText Acheive this ?

2010-03-10 Thread Mike Marchywka







 2. Can I edit/save this pdf ?

 A form created by iText can not be saved locally by a
 student/professor using Adobe Reader (= why the word
 online is so important).
 The data can be entered and submitted to a server.
 On the server, the form can be filled with that data
 and that filled out form can be saved.

 If you need a solution where the form can be saved
 locally, you have to Reader Enable the form, and that's
 only possible with Adobe software.


Given that this is a targeted audience, can't you also distribute
a custom reader from the open source project that is always enabled?
What is so special about the process of saving locally that
requires a special option during document creation?
Also, since many people assume that PDF is some standard with
support from multiple companies , why does the answer
you can only do that with adobe products come up so often?
Is there an alternative even if it involves some digging or writing?
Adobe  may have a lot of things but none of this should be magic
or inherently impossible for others to support. 

This was always one thing that bothers me about PDF.
Ok, fine you can call it a standard and say it is portable 
and not vendor specific but
the realistic options for getting tools seem to be limited in
some cases.
Thanks.


  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Integration a 3D CAD drawing into a PDF

2010-03-05 Thread Mike Marchywka












 Date: Wed, 3 Mar 2010 23:17:47 -0800
 From: j...@infolox.de
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Integration a 3D CAD drawing into a PDF


 Hello,

 I'm a rookie in iText and Java and need some help. I try to integrate a 3D
 CAD drawing into a PDF. The code you can see below. Unfortunatly shows the
 artwork from the bottom view. I don't undertstand the impact of the
 parameters to the view. Is there any description with examples how to use
 the iText solving that problem? Could anybody help me with that problem?


I was hoping Leonard would have replied to this- didn't you post this same thing
the other day? In fact, I think when claimed that PDF is just a bunch of pixels,
he pointed me to an example in which a PDF contained a 3D model of something.
I thought he may explain how you could put the model, not a 2D view, into the
PDF...



 Thank you in advance!

 Julia

 package com.lowagie.toolbox.plugins;

 import java.io.*;

 import javax.swing.*;

 import com.lowagie.text.*;
 import com.lowagie.text.pdf.*;
 import com.lowagie.toolbox.AbstractTool;
 import com.lowagie.toolbox.arguments.*;
 import com.lowagie.toolbox.arguments.filters.PdfFilter;
 import com.lowagie.toolbox.arguments.filters.U3DFilter;

 import java.net.*;

 /**
 * This tool lets you add a embedded u3d 3d annotation to the first page of
 a document. Look for
 * sample files at
 http://u3d.svn.sourceforge.net/viewvc/u3d/trunk/Source/Samples/Data/
 * @since 2.1.1 (imported from itexttoolbox project)
 */
 public class Add3D extends AbstractTool {
 static {
 addVersion($Id: Add3D.java 3373 2008-05-12 16:21:24Z xlv $);
 }

 FileArgument destfile = null;
 public static final String PDF_NAME_3D = 3D;
 public static final String PDF_NAME_3DD = 3DD;
 public static final String PDF_NAME_3DV = 3DV;
 public static final String PDF_NAME_3DVIEW = 3DView;
 public static final String PDF_NAME_C2W = C2W;
 public static final String PDF_NAME_IN = IN;
 public static final String PDF_NAME_MS = MS;
 public static final String PDF_NAME_U3D = U3D;
 public static final String PDF_NAME_XN = XN;

 /**
 * This tool lets you add a embedded u3d 3d annotation to the first page
 of a document.
 */
 public Add3D() {
 super();
 menuoptions = MENU_EXECUTE | MENU_EXECUTE_SHOW;
 FileArgument inputfile = new FileArgument(this, srcfile,
 The file you want to add
 the u3d File, false,
 new PdfFilter());
 arguments.add(inputfile);
 FileArgument u3dinputfile = new FileArgument(this, srcu3dfile,
 The u3d file you want to add, false,
 new U3DFilter());
 arguments.add(u3dinputfile);
 StringArgument pagenumber = new StringArgument(this, pagenumber,
 The pagenumber where to add the u3d
 annotation);
 pagenumber.setValue(1);
 arguments.add(pagenumber);
 destfile = new FileArgument(this, destfile,
 The file that contains the u3d
 annotation after processing,
 true, new PdfFilter());
 arguments.add(destfile);
 inputfile.addPropertyChangeListener(destfile);
 }

 /**
 * Creates the internal frame.
 *
 */
 protected void createFrame() {
 internalFrame = new JInternalFrame(Add3D, true, true, true);
 internalFrame.setSize(300, 80);
 internalFrame.setJMenuBar(getMenubar());
 System.out.println(=== Add3D OPENED ===);
 }

 /**
 * Executes the tool (in most cases this generates a PDF file).
 *
 */
 public void execute() {
 try {
 if (getValue(srcfile) == null) {
 throw new InstantiationException(
 You need to choose a sourcefile);
 }
 if (getValue(srcu3dfile) == null) {
 throw new InstantiationException(
 You need to choose a u3d file);
 }
 if (getValue(destfile) == null) {
 throw new InstantiationException(
 You need to choose a destination file);
 }
 int pagenumber = Integer.parseInt( (String)
 getValue(pagenumber));
 // Create 3D annotation
 // Required definitions
 PdfIndirectReference streamRef;
 PdfIndirectObject objRef;
 PdfReader reader = new PdfReader(((File) getValue(srcfile))
 .getAbsolutePath());

 String u3dFileName = ((File) getValue(srcu3dfile))
 .getAbsolutePath();
 PdfStamper stamp = new PdfStamper(reader, new FileOutputStream(
 (File) getValue(destfile)));

 /*Add Infos to HashMap
 HashMap info = reader.getInfo();
 info.put(Author, infolox);
 stamp.setMoreInfo(info);
 stamp.insertPage(reader.getNumberOfPages(),
 reader.getPageSize(pagenumber));*/
 PdfWriter wr = stamp.getWriter();
 PdfContentByte cb = stamp.getUnderContent(pagenumber);
 Rectangle rectori = reader.getCropBox(pagenumber);
 /*Rectangle rect = new Rectangle(new Rectangle(100,
 rectori.getHeight() - 550, rectori.getWidth() - 100,
 rectori.getHeight() - 150));
 */
 Rectangle rect = new Rectangle(new Rectangle(55,
 rectori.getHeight() - 675, rectori.getWidth() - 55,
 rectori.getHeight() - 175));

 PdfStream oni = new PdfStream(PdfEncodings.convertToBytes(
 runtime.setCurrentTool(\Rotate\);, null));
 oni.flateCompress();

 // Create stream to carry attachment
 PdfStream stream = new PdfStream(new
 FileInputStream(u3dFileName),
 wr);

Re: [iText-questions] Performance improvement to PdfGraphics2D

2010-03-05 Thread Mike Marchywka

 Date: Thu, 4 Mar 2010 10:16:30 -0800
 From:
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Performance improvement to PdfGraphics2D

 Hello,

 I was using iText to convert a JTable to a PDF. This was consuming a
 large amount of memory and taking a long time, so I did some memory
 profiling and have attached a patch that significantly improves
 performance for us. The following describes what I found, and what the
 patch does:

 When printing a JTable, you have to construct a lot of
 child PdfGraphics2D objects. For each child, the following happens:

 1. A BufferedImage is created just so that we can get a regular
 Graphics2D. This Graphics2D object may never be used, so I patched
 PdfGraphics2D to construct it only if needed.

But ctor calls are order-0 ( humour). Yes, garbage generation can be a big deal
and beside the ultimate GC problems that may not show up on profiling ( when 
the 
GC thread executes it doesn't show up in your stack trace),  initialization code
can take forever because, well, everything is initialized including large
arrays ( you don't get initialized memory for free even if it is still one line 
of source code)  etc. 
Usually you see warnings about this with string manipulations
since temps aren't always appreciated and can become significant in a hurry.
I also remember being shocked at the start up time in some apps that
were cleaned up to be more OO- I'm really not sure if anyone cares about init 
resources... In C++ there is some hope compiler can fix a lot
of OO overhead but things are worse in java. 

Once terms like graphics start to appear, the attention goes to inner
loops and cool terms related to getting pixels onto the screen. With
java, the optimization in this native code can make all the surrounding
stuff an important time sink. 

 2. Two arrays of PdfGState are created, but are then replaced with
 the parent's arrays. I patched PdfGraphics2D to create these arrays in
 the non-private constructor. You might want to consider using the
 clone() method instead of keeping that private constructor around. The
 normal .clone() behaviour is very similar to what you have done
 manually in the .create() method.

 Finally, I noticed that the AWT PathIterator.currentSegment(float[])
 method creates a double[] internally. That is because the float[]-based
 method just passes through to the double[]-based method. I modified
 your use of the PathIterator to take this into account.

 Can this patch be included in the next release?

 Also, I am working on a commercial product. Can you clarify for me
 whether or not iText PDF can be included as a .jar in our commercial
 (non-open-source) product? I cannot remember whether using a .jar is
 considered a derivative work or not. If we're allowed to use it, then I
 will probably do a little more work on improving performance of iText
 and will send that on.

 Thanks,

 Peter.

_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Converting files to PDF and TIFF

2010-03-03 Thread Mike Marchywka

 Date: Wed, 3 Mar 2010 19:36:53 +0600
 From: kasun0...@gmail.com
 To: iText-questions@lists.sourceforge.net
 Subject: [iText-questions] Converting files to PDF and TIFF

 Hi all,
 I am new to IText.I am developing a java application where a method take a 
 list of files that can be of type .doc, .txt, .rtf, .html, .TIFF, .odt .The 
 list of files is iterated over and each one is converted and added to a 
 single TIFF file. The new TIFF file is then returned.

 And another method which take a list of files that can be of type .doc, .txt, 
 .rtf, .html, .TIFF, .odt The list of files is iterated over and each one is 
 converted and added to a PDF file.
 The PDF file is returned

 Will i be able to do this only using itext or do i need to use any other 
 thirt party library for this purpsose.

JAI comes up sometimes depending on what images you really end up wanting.
I've been pushing an open source renderer to help diagnose your pdf results.
Have you looked at open office source code I would think there may be a bit
in there for some conversions but I'm not sure what all it actually does or 
doesn't do. 

 If you have any suggestions or thoughts please fill me with them.

 Thanks  Best Regards

 Kasun

_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title

2010-03-01 Thread Mike Marchywka

 Date: Sun, 28 Feb 2010 07:55:06 -0800
 From: sandys...@yahoo.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title

 Hi Mike,

 The .jar is always built into the .ear for the application and deployed to
 the server.

 The original WL8 version used iText-1.4.7.jar. I did not have to make any
 significant code changes when I went from iText-1.4.7.jar to
 iText-2.1.7.jar, while there were quite a few changes going to
 iText-5.0.1.jar. Since WL10 runs under Java 1.6, and the 5.0.1 version is
 written in Java 5, I was hoping the Adobe 9 issues were resolved in
 iText-5.0.1.jar.

well, it doesn't sound like a gross problem- the insiginifcant 
would be good suspects I guess. 

 I wish it was easier to narrow down the problem in the iText.jar. Since no
 errors are thrown and I have not been able to pinpoint the difference in the
 documents I was hoping somebody had already experienced the problem.

Its hard to know if itext even thinks there is a problem.
I guesss a debug jar could help, I have gotten in to habit of using the
c++ preprocessor with java and can make build with various features
or for different target platforms ( even java is not entirely platform 
independent).

Really this could be anything- an invalid input image, a messed up font, etc.
If you can identify what the reader is complaining about you should be able
to narrow down the possible code issues. 
It is possible you could just dump the pdf as an ascii file and visually
compare the ascii to a known good one or see if the high bit is
now being reset using a binary dump utility etc.

 Thanks...
 S

 Mike Marchywka-2 wrote:

 Date: Sat, 27 Feb 2010 11:12:04 -0800
 From: sandys...@yahoo.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] PDF opened in Java shows 'Error Page' in
 Title

 Hi Mike,

 The code has been under my control through the changes aso I can confirm
 that no configurations changes were made. Also all the tests were run on

 Changing versions of something often creates small config changes.
 This could be a classpath order or some really subtle thing- unlikely
 but I'm pointing out issues. Normally you can't just change a jar
 file when API has changed and assume all the methods are the same
 and it sounds like you never tried to recompile your app against the new
 itext jars. I usually try to have a build of somethiing designed to run on
 a server
 that runs from the  command line so it is easier to test.

 the same server so that is also not likely to be the problem. However, my
 PC has Adobe 9 installed which was the client when I ran the tests
 described
 below. Then later, I tested it on a PC with Adobe 8 installed and I did
 not see ant of the errors.

 This could be an Adobe 9 related problem. Does anybody know of a fix for
 it?

 If you want to approach this empirically and look at properties of the
 final
 result, version-specific Adobe problems can only be answered by an insider
 like
 Leonard- I'm not sure if either version has any means to get a detailed
 error report or
 report back to adobe when it finds an error ( obviously I try not to use
 them LOL) .
 I'm not sure there is a good  pdf-dump tool likely to
 point to the questionable code inside the pdf but maybe someone can
 suggest one.
 But, again, if you can narrow down the
 problem item and it seems to be isolated, you may be able to find the
 itext code
 responsible and post that.

 Thanks
 S

 Mike Marchywka-2 wrote:

 Date: Fri, 26 Feb 2010 11:53:27 -0800
 From: sandys...@yahoo.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] PDF opened in Java shows 'Error Page' in
 Title

 This may be an application specific problem. If anybody has experienced
 it
 before, please let me know.

 I am in the process of upgrading a java web application from Weblogic8
 to
 Weblogic10. The application renders invoices in PDF format via
 iText.jar.

 There have been some cases where web servers treat PDF as ASCII
 and clear high bit. You wouldn't want to rule out a change in
 configuration
 so if you can diff config files that may be worthwhile ( if not for this
 specific problem more generally ). Some people have complained about
 quirks or problems using their code with servlets etc and it isn't hard
 to write code that only works for a very idiosyncratic server setup.

 The original WL8 version used iText-1.4.7.jar

 For the WL10 version I first tried iText-2.1.7.jar saw the errors I am
 about
 to describe in this post and so am now trying out iText-5.0.1.jar but
 still
 see the same errors. The invoices (1 or 2 page documents) are all
 rendered
 correctly, however, some of the invoices are rendered with the title
 displayed as:

 Billng App - Error Page (only this line shows)
 https://... url

Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title

2010-02-28 Thread Mike Marchywka

Date: Sat, 27 Feb 2010 11:12:04 -0800
From: sandys...@yahoo.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title

Hi Mike,

The code has been under my control through the changes aso I can confirm
that no configurations changes were made. Also all the tests were run on

Changing versions of something often creates small config changes.
This could be a classpath order or some really subtle thing- unlikely
but I'm pointing out issues. Normally you can't just change a jar
file when API has changed and assume all the methods are the same
and it sounds like you never tried to recompile your app against the new
itext jars. I usually try to have a build of somethiing designed to run on a
server
that runs from the command line so it is easier to test.

the same server so that is also not likely to be the problem. However, my
PC has Adobe 9 installed which was the client when I ran the tests described
below. Then later, I tested it on a PC with Adobe 8 installed and I did
not see ant of the errors.

This could be an Adobe 9 related problem. Does anybody know of a fix for
it?

If you want to approach this empirically and look at properties of the final
result, version-specific Adobe problems can only be answered by an insider like
Leonard- I'm not sure if either version has any means to get a detailed error
report or
report back to adobe when it finds an error ( obviously I try not to use them
LOL) .
I'm not sure there is a good pdf-dump tool likely to
point to the questionable code inside the pdf but maybe someone can suggest
one.
But, again, if you can narrow down the
problem item and it seems to be isolated, you may be able to find the itext code
responsible and post that.

Thanks
S

Mike Marchywka-2 wrote:

Date: Fri, 26 Feb 2010 11:53:27 -0800
From: sandys...@yahoo.com
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] PDF opened in Java shows 'Error Page' in Title

This may be an application specific problem. If anybody has experienced
it
before, please let me know.

I am in the process of upgrading a java web application from Weblogic8 to
Weblogic10. The application renders invoices in PDF format via iText.jar.

There have been some cases where web servers treat PDF as ASCII
and clear high bit. You wouldn't want to rule out a change in
configuration
so if you can diff config files that may be worthwhile ( if not for this
specific problem more generally ). Some people have complained about
quirks or problems using their code with servlets etc and it isn't hard
to write code that only works for a very idiosyncratic server setup.

The original WL8 version used iText-1.4.7.jar

For the WL10 version I first tried iText-2.1.7.jar saw the errors I am
about
to describe in this post and so am now trying out iText-5.0.1.jar but
still
see the same errors. The invoices (1 or 2 page documents) are all
rendered
correctly, however, some of the invoices are rendered with the title
displayed as:

Billng App - Error Page (only this line shows)
https://... url to the page

The title should display:

https://... url to the page

Error Page is the default title for the application's errorpage.jsp

Can you trace the code far enough to get something related to itext?
Did you recompile against new itext or just replace jar files?
Often catching Exception is done when Throwable is more comprehensive.
In particular note that this does not derive from exception,

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/NoSuchMethodError.html

If the applicaiton is throwing any errors, it is not apparent as there
are
no errors in the logs or any visible differences in the PDF document. It
is
always the same documents that display the error.

Please let me know if you need more information, code extracts or screen
shots.

Your responses to this would be much appreciated.

Thanks!
S

--
View this message in context:
http://old.nabble.com/PDF-opened-in-Java-shows-%27Error-Page%27-in-Title-tp27722675p27722675.html
Sent from the iText - General mailing list archive at Nabble.com.

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list

Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title

2010-02-27 Thread Mike Marchywka

Date: Fri, 26 Feb 2010 11:53:27 -0800
From: sandys...@yahoo.com
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] PDF opened in Java shows 'Error Page' in Title

This may be an application specific problem. If anybody has experienced it
before, please let me know.

I am in the process of upgrading a java web application from Weblogic8 to
Weblogic10. The application renders invoices in PDF format via iText.jar.

There have been some cases where web servers treat PDF as ASCII
and clear high bit. You wouldn't want to rule out a change in configuration
so if you can diff config files that may be worthwhile ( if not for this
specific problem more generally ). Some people have complained about
quirks or problems using their code with servlets etc and it isn't hard
to write code that only works for a very idiosyncratic server setup.

The original WL8 version used iText-1.4.7.jar

For the WL10 version I first tried iText-2.1.7.jar saw the errors I am about
to describe in this post and so am now trying out iText-5.0.1.jar but still
see the same errors. The invoices (1 or 2 page documents) are all rendered
correctly, however, some of the invoices are rendered with the title
displayed as:

Billng App - Error Page (only this line shows)
https://... url to the page

The title should display:

https://... url to the page

Error Page is the default title for the application's errorpage.jsp

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/NoSuchMethodError.html

If the applicaiton is throwing any errors, it is not apparent as there are
no errors in the logs or any visible differences in the PDF document. It is
always the same documents that display the error.

Please let me know if you need more information, code extracts or screen
shots.

Your responses to this would be much appreciated.

Thanks!
S

--
View this message in context:
http://old.nabble.com/PDF-opened-in-Java-shows-%27Error-Page%27-in-Title-tp27722675p27722675.html
Sent from the iText - General mailing list archive at Nabble.com.

_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Extracting a page as an Image

2010-02-24 Thread Mike Marchywka

 Date: Wed, 24 Feb 2010 12:04:12 +0100
 From: jan.lendh...@vevention.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Extracting a page as an Image

 Hi there,

 Just a short question, as I did not find anything on google or I used the 
 wrong search phrases.

 I would like to extract the whole first page of a pdf file as an image (png 
 or jpeg or so).

again, I think the pdf tool kit of xpdf stuff from foolabs works. Also, this 
was what I did
with the open source renderer and I earlier had an interest in getting OCR 
samples.
Just trying to skim my history file, there is something called pdfimages and
I can also find the command lines where I used a modified open source renderer 
to do the
same thing, as I recall that wasn't too difficult but I think I archived all 
that code.
It does seem I tried to use imagemagick, can't rememberif that worked. 

http://www.google.com/#hl=ensafe=offq=pdfimagesaq=faqi=g10aql=oq=

http://www.foolabs.com/xpdf/

and the final solution, 

https://pdf-renderer.dev.java.net/

 Is this possible or are there any examples out there?

 Thanks a lot in advance,

 Jan

_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Extracting a page as an Image

2010-02-24 Thread Mike Marchywka













 Date: Wed, 24 Feb 2010 05:08:44 -0800
 From: wasegra...@bellsouth.net
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Extracting a page as an Image

 - Original Message 
 From: wasegraves 
 To: Post all your questions about iText here 
 Sent: Wed, February 24, 2010 7:58:36 AM
 Subject: Re: [iText-questions] Extracting a page as an Image

 - Original Message 
 From: Mike Marchywka 
 To: itext-questions@lists.sourceforge.net
 Sent: Wed, February 24, 2010 6:38:04 AM
 Subject: Re: [iText-questions] Extracting a page as an Image
 
 Date: Wed, 24 Feb 2010 12:04:12 +0100
 From: jan.lendh...@vevention.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Extracting a page as an Image
 ...
 Just a short question, as I did not find anything on google or I used the 
 wrong search phrases.
 ...
 I would like to extract the whole first page of a pdf file as an image (png 
 or jpeg or so).

 ...
 It does seem I tried to use imagemagick, can't remember if that worked.

 It worked when I used it to convert an AcroForm to a JPEG image.

 You realize, of course, that this is not an iText question. That said, you 
 could wrap a PDF in an Image object with iText. There are ample examples in 
 the book to show you how this is done.

These questions keep coming up and the tools should be of general interest to 
people trying to use itext.



 Best regards,
 Bill Segraves



 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Extracting a page as an Image

2010-02-24 Thread Mike Marchywka

 Date: Wed, 24 Feb 2010 14:31:51 +0100
 From: jmr...@gmail.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Extracting a page as an Image

 Hello,

 I did this some years ago:

 Use pdftk (based on itext) to extract the page(s) and then convert it to 
 image using ghostscript.

 pdftk is at http://www.accesspdf.com/pdftk/

 ghostscript is at http://www.gnu.org/software/ghostscript/

Bruno et al I think you could save time with a page of tools somewhere on itext
site similar to faq and an acrynum list ( THNTDWI- this has nothing to do with 
itext for example).

We have several approaches, pdftk, foolabs xpdf, imagemagick( which I did just 
check works
with convert x.pdf y.jpg ) and my own additions to open source render link I 
posted earlier
( I had to fish this out of an archive, I only had to add a couple of classes 
and chop off the
gui stuff). 

 Best regards
 Jose

 2010/2/24 wasegraves

 - Original Message 

 From: Mike Marchywka

 To: itext-questions@lists.sourceforge.net

 Sent: Wed, February 24, 2010 6:38:04 AM

 Subject: Re: [iText-questions] Extracting a page as an Image

 Date: Wed, 24 Feb 2010 12:04:12 +0100

 From: jan.lendh...@vevention.com

 To: itext-questions@lists.sourceforge.net

 Subject: [iText-questions] Extracting a page as an Image

 ...

 Just a short question, as I did not find anything on google or I used the 
 wrong search phrases.

 ...

 I would like to extract the whole first page of a pdf file as an image (png 
 or jpeg or so).

 ...

 It does seem I tried to use imagemagick, can't rememberif that worked.

 It worked when I used it to convert an AcroForm for a JPEG image.

 Best regards,

 Bill Segraves

 ...

 --

 Download Intel® Parallel Studio Eval

 Try the new software tools for yourself. Speed compiling, find bugs

 proactively, and fine-tune applications for parallel performance.

 See why Intel Parallel Studio got high marks during beta.

 http://p.sf.net/sfu/intel-sw-dev

 ___

 iText-questions mailing list

 iText-questions@lists.sourceforge.net

 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php

 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/

 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Retrieve Fonts from an existing PDF?

2010-02-23 Thread Mike Marchywka

 Date: Tue, 23 Feb 2010 09:26:10 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Retrieve Fonts from an existing PDF?

 Nirmal Fernando wrote:
 How can I retrieve Fonts of an existing PDF? Is there a way in iText to
 read the fonts embedded and replace them with a different type of font??

 This isn't necessarily impossible, but in most cases it's very difficult
 and in some cases it's very unwise.

 For instance: what if only a subset of the font is present in the PDF?
 Then you'll never be able to retrieve the full font, and will you
 replace this subset with a full font or a corresponding subset?

 Do you know anything about encoding? Do you know anything about CMaps?
 Do you know anything about the differences in metrics?

 For instance: the width of the words Foobar Film Festival is 178.74 pt
 in Helvetica, but only 157.90 in Times-Roman for the same font size
 (12). In other words: if you replace Helvetica with Times-Roman, you'll
 screw up your entire layout. Remember that PDF is NOT a Word processing
 format; every glyph is positioned at a predictable location. If you want
 to change the font, you need to do the layout all over again. That is:
 recreate the PDF from scratch.

I guess I'd just ask how hard it would have been for original author to
include enough information, either with standard or an agreed upon
private convention, to the OP to do any required re-layout? That is, 
what would be involved in creating the original pdf with enough logical
structure to make it likely you could accomodate a font change ( or
just extract the words that are often the only thing people care about,
not fonts and columns etc)?

This is just a variant of my recurring rant ( we want information not
pictures in many cases to feed other computer programs ) to which you 
contribute a good
point- fonts are complicated and human readability ads a lot of stuff. 
A human audience of course is perfectly valid and there is nothing wrong with
using graphics to make it easier for the reader. However, having a short 
tractable alphabet, rather than
say words composed of unordered and unbounded collection of things,
or even 24-bit color pictures,  is
a big asset in organizing information- once you start adding stuff it can
be confusing unless you take some care to separate the information from artwork.

 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] After rotation, images are getting displced in iText PDF

2010-02-23 Thread Mike Marchywka

Date: Mon, 22 Feb 2010 20:33:10 -0800
From: ra...@vinfotech.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] After rotation, images are getting displced in
iText PDF

Hi,

It's very urgent. We need help in this. Looking towards positive support.

If this is that urgent,
can't you just kluge a solution by translating the output or is this more
involved than a one-line call? I thought the conclusion was something
to that effect being the best itext solution.
Designing an API is always a tradeoff, and it is possible there is no center
parameter
anywhere

Am I right if I say that this is more like a Math question (algebra)
than an iText question?
--
This answer is provided by 1T3XT BVBA

rhul_rk wrote:

Yes, you are right, there are lots of calculation and algebra in this. But
each and every object plots in the PDF are exist in an imaginary (hidden)
rectangle. When we plot an image (without rotation) it is plotting at the
correct location. but when we rotate an object, it gets displaced in PDF.
Because as stated earlier each object is exist in the imaginary (hidden)
rectangle, which is having Plane and Absolute height Width. On
rotation, absolute height and width gets changed but Plane height width
remains the same. These are the height and width of imaginary (hidden)
rectangle.

Say, we are rotating the object in iTextSharp with the help of
Image.RotationDegree=45. It rotates the object to 45 degree from the
Bottom Left Corner of the imaginary (hidden) rectangle instead of actual
image's Bottom Left Corner. Just want to ask you that is there any
mechanism in iTextSharp to rotate any object from center point instead of
bottom left corner?

Thanks for you quick reply. Looking for you support.

Thanks,
Rahul

1T3XT info wrote:

rhul_rk wrote:
One more thing here
we would like to mention that, in flex application items are rotating
from
center point and in the iTextSharp PDF items are rotating from Lower
Bottom
corner. Is their any sort of mechanism in iTeshSharp to rotate the image
from center point with mention Degree of rotation.

Hope we are clear with our issue.

Am I right if I say that this is more like a Math question (algebra)
than an iText question?
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

--
View this message in context:
http://old.nabble.com/After-rotation%2C-images-are-getting-displced-in-iText-PDF-tp27683306p27698368.html
Sent from the iText - General mailing list archive at Nabble.com.

_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions:

Re: [iText-questions] After rotation, images are getting displced in iText PDF

2010-02-23 Thread Mike Marchywka


it looks like it was in your response, hotmail has been 
a big problem with Executing text that happens
to look like html even with text mode on, but in this
case it seems it made it out since it made it back, 





 From: ra...@vinfotech.com
 Mike, thanks for trying to post the answer. But I get empty reply from
 yours. Might be accidentally it happen. I will be glad to receive the
 answer.

 Thanks!!!
 Rahul Khadikar
 -Original Message-
 From: Mike Marchywka [mailto:marchy...@hotmail.com]
 Sent: Tuesday, February 23, 2010 4:42 PM
 It's very urgent. We need help in this. Looking towards positive support.

 If this is that urgent,
 can't you just kluge a solution by translating the output or is this more
 involved than a one-line call? I thought the conclusion was something
 to that effect being the best itext solution.
 Designing an API is always a tradeoff, and it is possible there is no
 center parameter
 anywhere



 Am I right if I say that this is more like a Math question (algebra)
 than an iText question?
 --
 This answer is provided by 1T3XT BVBA



  
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Mike Marchywka












 Date: Tue, 23 Feb 2010 06:52:54 -0800
 From: fernandogomes...@hotmail.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Using Images extracted from a pdf


 can anyone help-me one more time..
 i dont know what i do ..

 I need to get the image bytes, now decoded...

probably the open source pdf renderer would answer your questions and provide
more context. I seem to recall it was pretty easy to modify to extract page 
images
in your favorite format, probably in process of rendering the included images
are extracted etc.





 String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
 String filter = pdfStrem.get(PdfName.FILTER).toString();
 int bits =
 Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
 int width =
 Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
 int height =
 Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
 PdfDictionary param =
 (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS);
 int colors =
 Integer.valueOf(param.get(PdfName.COLORS).toString());
 int predictor =
 Integer.valueOf(param.get(PdfName.PREDICTOR).toString());
 int colums =
 Integer.valueOf(param.get(PdfName.COLUMNS).toString());
 if(filter.equals(/FlateDecode))
 {
 byte[] bytesDecod = PdfReader.FlateDecode(bytes);

 these are all the information that I can withdraw PDF

 I have to do to create my image in general ..
 I'm trying to do, or learn, but this hard, all my attempts have failed.
 ty


 Fernando Gomes wrote:

 Sirs, really sorry for duplicating, can delete other topics ?
 so sorry ..:blush:

 very thkx for help..
 and so good fast help ..
 i will estudy more ..


 Leonard Rosenthol-3 wrote:

 You are assuming that PDF maintains the PNG nature of the image - that is
 NOT the case. PDF only supports two kinds of images JPEG (which is why
 this works) and raw bitmaps (aka an array of bits). So in your case,
 with the PNG, it is transcoded into the latter case and so if you want it
 back you will need to reverse the process on your end.



 for this response in other same email :blush:
 quote of 1T3XT info below ..

 really thanks. I must have seen the realance the chapter that you
 mentioned, I will read again and very carefully. My English is very weak,
 and it is very difficult to read.

 you are very funny, I laughed a lot. I know I deserved the scolding.
 Really thanks for your help. I will test and then come back to post the
 result.
 Thank you!


 1T3XT info wrote:

 Fernando Henrique Gomes wrote:
 the problem is when I insert an image in PNG format and then try to get
 the same...

 OK, we're talking about a PNG.
 If you've read chapter 10 of the 2nd edition of iText in Action,
 you know that PNGs are transformed into zipped pixels.
 If you didn't know, you should read the book!

 on here i try to take that image...

 [code]
 int XrefIndex =((PRIndirectReference)obj).getNumber();
 PdfObject pdfObj = pdf.getPdfObject(XrefIndex);
 PdfStream pdfStrem = (PdfStream)pdfObj;
 byte[] bytes =
 PdfReader.getStreamBytesRaw((PRStream)pdfStrem);
 if ((bytes != null)) {
 String fileName = Image_P+pageNumber+_;
 File file = new File(fileName);
 FileOutputStream fw = new FileOutputStream(file);
 fw.write(bytes);
 fw.flush();
 fw.close();
 BufferedImage img2 = ImageIO.read(file);
 com.lowagie.text.Image img =
 com.lowagie.text.Image.getInstance(file.toURL());
 }
 [/code]

 img2 returned a null 

 Of course, why do you think that would work???

 in line of img .. has a Excpetion
 Image_P1_ is not a recognized imageformat

 Of course, you're sending iText a bunch of pixels,
 but: what are the dimensions of the image,
 how many bits are there per component?

 when i try to do :
 [code]
 Image image = Toolkit.getDefaultToolkit().createImage(bytes);
 [code]

 and before create an image from this image getting the width and height
 from my PdfStream (create a buffered and draw the image)
 when i serialize on a file and visualize this.. this image in a fucking
 black picture .. all black -.-

 It's because you don't have a fucking clue about what you're doing :P
 Hehe, I was waiting for an occasion to use the F* word on the list.
 Thanks!

 if i use JPEG encode for my images.. all the 3 solution i have .. its
 ok.. have effects..

 Well, that's because iText stores JPEGs literally as a JPEG without
 changing any of the bytes. If you look inside, you'll see that the
 filter is DCTDecode (Discrete Cosine Transform).

 i can vizualize my images how to i create then .. perfect..
 but if i change de JPEG ... for any other encode.. thats not have efect
 ..

 No idea what you're saying here, but you also need to study images.

 can any help-me plz ?

 This example doesn't involve iText, but explains what you're missing.

 Let's create an image byte per byte:

 byte b[] = new byte[256 * 3];
 for (int i = 0; i  256; i++) {
 b[i * 3] = (byte) (255 - i);
 b[i * 3 + 1] = (byte) (255 - i);
 b[i * 3 + 2] = (byte) i;
 }

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Mike Marchywka





You can always use the command line tool in pdf toolkit or xpf, 
I can't remember which but there is something like
pdf2image similar to pdf2text to extract text.








 Date: Tue, 23 Feb 2010 12:43:28 -0800
 From: fernandogomes...@hotmail.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Using Images extracted from a pdf


 I'm going crazy with it. as you can see, I never manipulated images as low
 level. and do not have much sense of how things work. I am searching for a
 days for end my solution. and I'm already getting stressed.
 i going on test methods .. i try to do.. and before try by another choice..
 -.-

 can you give me some more assistance on how I can turn this array of bytes
 back into an image?

 could have just one class of api that made it not? : P

 Pdfimages buf = new pdfimages (myRawImageByteArray);
 buf.getAsBufferedImage ();

 : P

 if you say you can not help me all right, but I can indicate a content in
 which I can rely on to get this done?

 thanks.


 Leonard Rosenthol-3 wrote:

 The image is decompressed and then injected into the PDF. Same with
 EVERY TYPE of image EXCEPT JPEG.

 -Original Message-
 From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
 Sent: Tuesday, February 23, 2010 3:21 PM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Using Images extracted from a pdf


 ty ..

 I have a question.
 when I insert an image that is not jpeg
 what exactly happens with this?

 say that it is in PNG it is decompressed to be injected into PDF?

 or she keeps your PNG format, but the bytes are encoded with the
 FlateEncode
 ..

 a matter of finding the filter and decode do I get it.

 and if the image is uncompressed before being inserted to PDF, how do I
 know
 which type of encode the image?


 Leonard Rosenthol-3 wrote:

 Bits per pixel is the BitsPerComponent value in the image object

 Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width *
 NumComponents, where NumComponents is based on the colorspace in question
 (eg. RGB == 3, CMYK == 4).

 -Original Message-
 From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
 Sent: Tuesday, February 23, 2010 2:00 PM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Using Images extracted from a pdf




 public static BufferedImage createBufferedImageFromRawBytes(byte[]
 bytes,int width, int height, int bits) throws BadElementException,
 MalformedURLException, IOException {
 com.lowagie.text.Image img =
 com.lowagie.text.Image.getInstance(bytes);

 DataBuffer db = new DataBufferByte (img.getRawData(),
 img.getRawData().length);

 WritableRaster raster = Raster.createPackedRaster(db, //DATA BUFFER
 width, //LARGURA
 height, //ALTURA
 width*bits, //LARGURA * BITS POR PIXEL = PIXEL POR
 LINHA
 -scanlineStride
 // bits, //BITS POR PIXEL -pixelStride
 new int [] {bits},

 null);

 ColorSpace cs = ColorSpace.getInstance (img.getColorspace());
 ColorModel cm = new ComponentColorModel(cs, false, false,
 Transparency.OPAQUE, db.getDataType());
 BufferedImage bi = new BufferedImage (cm, raster, false, null);
 return null;
 }



 this code is up to where I could get, but there are variables that I know
 of
 to generate bufferedImage, please someone help me see if I'm on track.
 If I write something wrong.



 Fernando Gomes wrote:

 can anyone help-me one more time..
 i dont know what i do ..

 I need to get the image bytes, now decoded...

 String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
 String filter = pdfStrem.get(PdfName.FILTER).toString();
 int bits =
 Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
 int width =
 Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
 int height =
 Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
 PdfDictionary param =
 (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS);
 int colors =
 Integer.valueOf(param.get(PdfName.COLORS).toString());
 int predictor =
 Integer.valueOf(param.get(PdfName.PREDICTOR).toString());
 int colums =
 Integer.valueOf(param.get(PdfName.COLUMNS).toString());
 if(filter.equals(/FlateDecode))
 {
 byte[] bytesDecod = PdfReader.FlateDecode(bytes);

 these are all the information that I can withdraw PDF

 I have to do to create my image in general ..
 I'm trying to do, or learn, but this hard, all my attempts have failed.
 ty


 Fernando Gomes wrote:

 Sirs, really sorry for duplicating, can delete other topics ?
 so sorry ..:blush:

 very thkx for help..
 and so good fast help ..
 i will estudy more ..


 Leonard Rosenthol-3 wrote:

 You are assuming that PDF maintains the PNG nature of the image - that
 is NOT the case. PDF only supports two kinds of images JPEG (which is
 why this works) and raw bitmaps (aka an array of bits). So in your
 case, with the PNG, it is transcoded into the latter case and so if
 you
 want it back you will need to reverse the process on your end.

Re: [iText-questions] Writing to ServletOutputStream

2010-02-22 Thread Mike Marchywka

 Date: Mon, 22 Feb 2010 11:45:43 +0100
 From:
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Writing to ServletOutputStream

 Hi,

 Please note that I postet links to pastie.org for better readability of the 
 code snippets.

 I am using iText to print graphs produced with the JUNG Framework to pdfs.
 To achieve this I have the following code:

 http://www.pastie.org/private/rsy6wzneedpvo4dai3vcgw

 Writing the graphics Object to the pdf is done by the following code:

 http://www.pastie.org/private/zgwocpvjih16j2cmcdmza

 The produced ByteArrayOutputStream is used to save the content to a file 
 (works great - I get a wonderful pdf):

 http://www.pastie.org/private/imxi9cmdrzowop9ivxgnba

 The reason why I am generating a ByteArrayOutputStream is that I additionally 
 want to write the created pdf content to a ServletOutputStream:

 http://www.pastie.org/private/r4h2lad26xbwjokoh0zbq

 unfortunately the only thing I get is a PDF document in the desired dimension 
 but blank - no content :( I am using almost the same code for writing text 
 content to a ServletOutputStream and I do get the content - so I think the 
 code of the response is ok. Is there a problem of writing 
 ByteArrayOutputStream content containing iText data to ServletOutputStreams? 
 It is really weird that everything works when I write the 
 ByteArrayOutputStream content to a FileOutputStream and don't get anything 
 when I write it to the ServletOutputStream :(

I didn't hit the links and I'm not sure what you mean by blank but do you set 
the server's content type to
something telling your browser it is pdf? If you hit your server with something 
more diagnostic than artistic,
like wget instead of IE, you can at least see what it thinks is going on- is 
the byte count right etc. You may
even be able to do some diffs and determine if there is truncation or 
corruption etc. Also check the servlet
debugging information which you hopefully generate :)

 It would be great if you could take a look at my code.

 Thank you in advance!

 Sebastian Furth

_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Writing to ServletOutputStream

2010-02-22 Thread Mike Marchywka













 Date: Mon, 22 Feb 2010 14:56:45 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Writing to ServletOutputStream

 Sebastian Furth wrote:
 Something happens during the request because it returns a pdf in the
 dimension set in the code - but there is absolutely no content (blank).

 Sounds like the blank page problem described in the book (1st and 2nd
 edition). This happens if you shave the upper bit from every byte. The
 PDF structure is preserved, and as a result a viewer can show you all
 the pages of the PDF, the bookmarks, etc... But all binary data, for
 instance the page content stream, is made corrupt (of course: you've
 thrown away 1/8 of the information).

 If that's what's happening in your case, you have a configuration error
 somewhere.


cygwin has an octal dump utility(od iirc) , first few lines of output would at 
least
let you know if that is the problem. Offhand I don't know who would assume
you have text but again you'd have to think a wrong content type somewhere.
ByteArray of course is supposed to be just that, no text assumption but
if someone manipulates those there could be sign issues, all kinds of things
could happen. I just made a funny post on cygwin-talk list about 
that bit being used for parity only LOL.



 --
 Download Intel® Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Writing to ServletOutputStream

2010-02-22 Thread Mike Marchywka

Date: Mon, 22 Feb 2010 15:14:55 +0100
From: sebastian.fu...@googlemail.com
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Writing to ServletOutputStream

Thanks for your reply!

Is it possible that the ServletOutputStream shaves the upper bit from every
byte?

I have a method which returns a ByteArrayOutputStream containing the pdf data.
If I delegate this OutputStream to a FileOutputStream everything is ok but if
I use

the ServletOutputStream there is no content in the pdf.

If the ServletOutputStream is responsible for this do you have a idea how I
can
prevent it from doing this?

javadocs claim it is for binary data but maybe you are using wrong method or
manipulating bytes
as char or have subclassed, I dunno,

http://www.google.com/#hl=ensafe=offq=site%3Asun.com+ServletOutputStreamaq=faqi=oq=fp=d95f0d161f018361

Thank you in advance!

Best regards.

Sebastian Furth

2010/2/22 1T3XT info

Sebastian Furth wrote:

Something happens during the request because it returns a pdf in the

dimension set in the code - but there is absolutely no content (blank).

Sounds like the blank page problem described in the book (1st and 2nd

edition). This happens if you shave the upper bit from every byte. The

PDF structure is preserved, and as a result a viewer can show you all

the pages of the PDF, the bookmarks, etc... But all binary data, for

instance the page content stream, is made corrupt (of course: you've

thrown away 1/8 of the information).

If that's what's happening in your case, you have a configuration error

somewhere.

Download Intel® Parallel Studio Eval

Try the new software tools for yourself. Speed compiling, find bugs

proactively, and fine-tune applications for parallel performance.

See why Intel Parallel Studio got high marks during beta.

http://p.sf.net/sfu/intel-sw-dev

___

iText-questions mailing list

iText-questions@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/

You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Writing to ServletOutputStream

2010-02-22 Thread Mike Marchywka



LOL, I don't imagine this will help, but the high bit is probably being set to 
zero. Whoever is creating all
those content attach/disposition things probably doesn't know it is not text. 

marchywka:/home/marchywka# od -ax  Desktop/Car-Diagnosis_Visualization.pdf | 
sed -e 's/ /\n/g' | grep ^$ | cut -c 1 | sort | uniq -c
    152 0
    146 1
    273 2
   1537 3
    202 4
    194 5
    327 6
    215 7
marchywka:/home/marchywka# 





 Date: Mon, 22 Feb 2010 15:30:21 +0100
 From: sebastian.fu...@googlemail.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Writing to ServletOutputStream

 Of course it were good hints - the problem is me :)

 OK I try to explain my problem:

 I have created a JSP-Servlet which shall return a PDF Document on request.

 //Get the file content
 ByteArrayOutputStream bstream = 
 de.d3web.empiricalTesting.caseVisualization.jung.JUNGCaseVisualizer.getInstance().getByteArrayOutputStream(t.getRepository());


 //Response
 response.setContentType(application/pdf);
 response.setHeader(Content-Disposition, 
 attachment;filename=\+filename+\);

 response.setContentLength(bstream.size());

 //Write the data from the ByteArray to the ServletOutputStream of the response
 bstream.writeTo(response.getOutputStream());
 response.flushBuffer();


 The pdf document is created by iText and should contain a graph (Graphics2D 
 Object).

 init(cases);

 int w = vv.getGraphLayout().getSize().width;
 int h = vv.getGraphLayout().getSize().height;


 ByteArrayOutputStream bstream = new ByteArrayOutputStream();
 Document document = new Document();

 try {

 PdfWriter writer =
 PdfWriter.getInstance(document, bstream);

 document.setPageSize(new Rectangle(w, h));
 document.open();

 PdfContentByte cb = writer.getDirectContent();
 PdfTemplate tp = cb.createTemplate(w, h);
 Graphics2D g2 = tp.createGraphics(w, h);

 paintGraph(g2);

 g2.dispose();
 tp.sanityCheck();
 cb.addTemplate(tp, 0, 0);
 cb.sanityCheck();

 document.close();


 } catch (DocumentException e) {
 Logger.getLogger(this.getClass().getName())
 .warning(Error while writing to file. The file was not created.  + 
 e.getMessage());

 }

 return bstream;

 If I delegate the ByteArrayOutputStream created in the method posted above to 
 a FileOutputStream the pdf has the desired content - but If I delegate it to 
 a ServletOutputStream the content (the Graphics2D Object) is missing.


 I attached the pdf where the Graphics is missing. Maybe you can get some 
 information out of it.

 Thank you in advance!

 Best regards

 Sebastian Furth


 2010/2/22 1T3XT info

 Sebastian Furth wrote:

 Once again, thanks for your reply. Unfortunately I think I don't have

 enough experience to understand your hints :)



 It were good hints though; I thought everybody knew wget.



 If possible, can you explain your problem as good as Mike explained how

 to use wget? For instance: save the PDF on your local system and open it

 using a text editor such as Notepad++, Wordpad,... What do you see?



 --

 Download Intel® Parallel Studio Eval

 Try the new software tools for yourself. Speed compiling, find bugs

 proactively, and fine-tune applications for parallel performance.

 See why Intel Parallel Studio got high marks during beta.

 http://p.sf.net/sfu/intel-sw-dev

 ___

 iText-questions mailing list

 iText-questions@lists.sourceforge.net

 https://lists.sourceforge.net/lists/listinfo/itext-questions



 Buy the iText book: http://www.1t3xt.com/docs/book.php

 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/

 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] RandomAccessFileOrArray file load in memory

2010-02-16 Thread Mike Marchywka

 Date: Mon, 15 Feb 2010 07:07:50 -0800
 From:
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] RandomAccessFileOrArray file load in memory

 Hello,

 i'm confront to big problem of out.memory error on server when i have a lot
 of user who want get a pdf file from tiff file.

This may be considered OT for itext, and maybe someone has a better answer,
 but after all my comments about resource
usage re pdf machinations- short answer see if anyone at sun.com has
similar issues and solutions for other server things and try subbing into itext 
for your own needs,
unless  Bruno has canned implementation alternatives. For requests that just
take a long time, you may have to change your paradigm and notify user later via
email or something when result is done. These are not specific to itext or pdf.

 I use Itext 1.2.7

 Is it possible to Override the RandomAccessFileOrArray for replace the byte
 arrayIn[] by a temporary file ?

Generally memory management in java is quite limited and long before you run
out of memory you would want to do things like maximize low level cache hits 
etc.
However, there may be something on sun.com as this is likely to
be a common issue when you scale java apps ( I've never bothered to look
mysef but it woldn't just be about itext) and you have the source code
so you can take alt approaches. Code is never really platform independent
and implementation details make of break real-world utility ( hence
issues with pdf resource needs and benefits). 

Also note if all the
users are translating the same image, in-memory caching of single objects
not duplicated hundreds of times, can be a big savings. You need a sharing
mechanism in this case. A scalable itext or something like that would probably
be a commercial product :)

Assuming you have zero virtual memory right now , this is just going to slow 
things down even
more ( preusmably your current out of memory condiution has alrady been 
addressed with
increased heap size to the point of doing a lot of VM thrashing ) 
and it could get to the point where each requests takes forever
as the whole system thrashes between requests ( you can probably write a simple 
equation
to determine the number of executing requests given the arrival rate and 
processing time with proc time
increasing with number of active requests). You might just be better
off limiting the number of active requests and queing the rest and notify
user when done if currently you are trying to return a complete pdf to user via 
the requesting
http connection. 

 or use temporary file when the file is more than 5 ko for example.

 Thank you,
 --
 View this message in context: 
 http://old.nabble.com/RandomAccessFileOrArray-file-load-in-memory-tp27595180p27595180.html
 Sent from the iText - General mailing list archive at Nabble.com.

 --
 SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
 Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
 http://p.sf.net/sfu/solaris-dev2dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Unable To Set Checkboxes With Complicated Names

2010-02-14 Thread Mike Marchywka

 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Sun, 14 Feb 2010 07:03:21 -0800
 Subject: Re: [iText-questions] Unable To Set Checkboxes With Complicated Names

 Actually, I would think you would LIKE the fact that the IRS switched to 
 XFA-based forms! This means that it's all XML-based, but inside of the simple 
 PDF wrapper. So now you can quite easily get all the information you need - 
 layout, data, etc. No muss, no fuss.

 Yes, PDF forms support submission of data in a variety of formats - HTML, 
 FDF, XFDF, XML (custom grammars) and PDF. It's up to the author of the form 
 to choose which they want, based on the system they are integrating with. The 
 IRS DOES allow submission of just data - that's how vendors such as Quicken, 
 TurboTax, etc. do their electronic filings. However, you need to establish 
 a trusted relationship with the IRS in order to be able to do this, due to 
 reasonable concerns about DoS attacks, etc.

Yeah well right now I want to pay my taxes and not have to either pay anyone or 
type numbers into something from which I can not
get them back out. It wasn't entirely a complaint although I'm glad you 
responded, 
just to reiterate that some customers do use pdf as a data sink. 
I haven't looked recently since I had a hard time extacting text from the 
instructions a while back and I could
not find a free way to send in the forms. I'm not sure why you need any more 
trust posting XML than writing a script
to browse their website ( are they really concerned about DOS attacks LOL?) 
or doing a credit card transaction over the internet- certainly there are some 
issues with
user confusion but still this seems to be restricted artificially beyond what 
similar transactions do for security
It would be just as easy for them  to send you back a copy for verififcation ( 
is this what you really meant?)
like most transcations do rather than push a front end that happens to look 
like paper and then make
approved e-paper pushers.
Quite simply, as you list the approved vendors, this looks like a business 
decision as much as anything. 

 FYI: The SEC supports submission in a variety of formats.

Generally they are dealing with structured submissions designed for data 
extraction. Personally I've never 
sent them anything, I have no idea what the front ends look like but I can go 
get plain text and some limited
XBRL filings. It would be nice to go to the IRS and get back similar results 
for myself for example,
not just a bunch of pixels and I have no idea what they may be thinking here 
for the future. 

 Leonard

 -Original Message-
 From: Mike Marchywka [mailto:marchy...@hotmail.com]
 Sent: Saturday, February 13, 2010 11:01 AM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Unable To Set Checkboxes With Complicated Names

 This rant does eventually relate back to PDF architecture, but you have
 to make it through the whole post LOL,

 The IRS forms are one of the things that brought me here and now GA is making
 it hard to get paper forms any more. Do you know if there is s secure place 
 you
 can just post an XML file to the IRS instead of all this formatting junk
 left over from the days of paper? The data itself is quite simple and they 
 have
 to separate it anyway, it would be easier just to have the home user press a 
 submit button
 that extracts
 the form data and have them post that back to the IRS?
 So I guess the question is, isn't there a way to design
 a pdf form such that you only submit the DATA back to the author, in
 this case the IRS) instead of all the format junk that they already have?

 Presumably this would let anyone submit tax data easily using the tool of 
 their
 choice even if free with no loss of security. I can type in the few kb of 
 numbers
 using (free) notepad and then post using wget. Most people of course wouldn't 
 do
 that but they probably want to import and export their numbers.. If you design
 your docs to allow the numbers to go in and out anyway, why can;'t
 you just send the numbers back to the people who wanted them in the first 
 place.?

 For example, the SEC  submissions of similar data in something
 called XBRL format which is purely machine readable and no indication of
 any interest in making copies of paper or stone tablets. While I guess
 you can call pdf a standard it inherently interlinks the pictures with
 the numbers of real interest making it hard to do some simple things.

 _
 Hotmail: Powerful Free email with security by Microsoft.
 http://clk.atdmt.com/GBL/go/201469230/direct/01/
 --
 SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
 Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
 http://p.sf.net/sfu/solaris-dev2dev

Re: [iText-questions] Unable To Set Checkboxes With Complicated Names

2010-02-13 Thread Mike Marchywka

 To: itext-questions@lists.sourceforge.net
 From: 
 Date: Thu, 11 Feb 2010 15:09:24 +
 Subject: Re: [iText-questions] Unable To Set Checkboxes With Complicated Names

 1T3XT info  1t3xt.info writes:

 Jun Zuo wrote:
 Hi!

 I hope that someone can tell me how to set checkboxes with names like:

 topmostSubform[0].Page1[0].Line6cTable[0].#subform[1].c1_07[0]

 The index of subform is 1 not 0.
 Are you sure you are trying to fill a static XFA form.
 It seems to me, you're working with a dynamic form.
 (But I could be wrong.)

 This is the 2009 Form 1040 from the IRS. I think it is a dynamic form! What is
 the trick for a dynamic form?

This rant does eventually relate back to PDF architecture, but you have
to make it through the whole post LOL,

The IRS forms are one of the things that brought me here and now GA is making
it hard to get paper forms any more. Do you know if there is s secure place you
can just post an XML file to the IRS instead of all this formatting junk
left over from the days of paper? The data itself is quite simple and they have
to separate it anyway, it would be easier just to have the home user press a 
submit button
that extracts
the form data and have them post that back to the IRS?
So I guess the question is, isn't there a way to design
a pdf form such that you only submit the DATA back to the author, in 
this case the IRS) instead of all the format junk that they already have?

Presumably this would let anyone submit tax data easily using the tool of their
choice even if free with no loss of security. I can type in the few kb of 
numbers
using (free) notepad and then post using wget. Most people of course wouldn't do
that but they probably want to import and export their numbers.. If you design
your docs to allow the numbers to go in and out anyway, why can;'t
you just send the numbers back to the people who wanted them in the first 
place.?

For example, the SEC  submissions of similar data in something
called XBRL format which is purely machine readable and no indication of
any interest in making copies of paper or stone tablets. While I guess
you can call pdf a standard it inherently interlinks the pictures with 
the numbers of real interest making it hard to do some simple things.

_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] FW: Mail to iTextSharp

2010-02-11 Thread Mike Marchywka

 Date: Thu, 11 Feb 2010 08:25:24 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] FW: Mail to iTextSharp

 suphat phuenpha wrote:
 If I want to convert a color PDF files to black and white or gray.
 What can I do that ?

 You can't.

Isn't this just a matter of finding all the color tables or models and changing 
them to grey?
When you say can't you mean there is nothing in the API that does this in a few 
lines
or it is fundamentally impossible ( without say rendering to pixels and makes 
new PDF pages out
of color-modified pixels)?

 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] pdf does not receive image from a servlet

2010-02-09 Thread Mike Marchywka











 To: itext-questions@lists.sourceforge.net
 From: djayaward...@westpac.com.au
 Date: Tue, 9 Feb 2010 22:30:36 +
 Subject: Re: [iText-questions] pdf does not receive image from a servlet

 Paulo Soares  glintt.com writes:


 Forget about iText for now. Can you get the image into a byte array? Once
 you get there Image.getInstance() will always work.

 Paulo


 Thanks Paul/Mark,

 Byte array did work. I got the picture into byte array and passed it to
 Image.getInstance();

After all of this, can you tell us what the problem was? Are you saying the 
byte[]
all worked but not the one where you pass URL or did you have to change 
anything?


 Thanks

 Donald





 --
 SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
 Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
 http://p.sf.net/sfu/solaris-dev2dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] pdf does not receive image from a servlet

2010-02-08 Thread Mike Marchywka

Date: Mon, 8 Feb 2010 07:00:43 +0100
From: br...@lowagie.com
To: djayaward...@westpac.com.au; itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] pdf does not receive image from a servlet

Donald Jayawardena wrote:

Hi Bruno,

Sorry for confusion. Is that allright if I put the problem down here
before I put it in iText?

Not really, but your question is more clear now.

2. I am having problems getting the image into PDF from the createjpg
servlet.

I am running the following command to get the image into PDF from
createjpg servlet:
com.lowagie.text.Image iRiskNo = com.lowagie.text.Image.getInstance(new
URL(http://accord-wf-dev.unix.srv.westpac.com.au:9704ccord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg;));

Is this syntax right or just copy/paste error? There is no / after the port
number and you have apparntly
a duplicated string . In any case, break this
up into two steps and use an alt itext method that takes something you
can examine like a byte[]. Also, that host name is not accessible to me.

wget -O ~/xxx.jpg -S -v
http://accord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg;
--2010-02-08 05:50:01--
http://accord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg
Resolving accord-wf-dev.unix.srv.westpac.com.au... failed: Name or service not
known.
wget: unable to resolve host address `accord-wf-dev.unix.srv.westpac.com.au'

wget -O ~/xxx.jpg -S -v
http://accord-wf-dev.unix.srv.westpac.com.au:9704/ccord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg;
--2010-02-08 05:47:58--
http://accord-wf-dev.unix.srv.westpac.com.au:9704/ccord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg
Resolving accord-wf-dev.unix.srv.westpac.com.au... failed: Name or service not
known.
wget: unable to resolve host address `accord-wf-dev.unix.srv.westpac.com.au'

If thiis is a private name it doesn't help us.

createjpg is called from the server.

During the debug, when this statement being executed, I can see the
output is showing:
SEVERE: -- returning Frame NULL
SEVERE: BaseDialog: owner frame is a java.awt.Frame

THIS IS NOT, I REPEAT, THIS IS NOT AN iText ERROR MESSAGE!!!
Please understand that you don't have to look at iText when
looking for the problem.

LOL, have you ever seen similarly labelled output from other servlets?
Again, hunt around the sun site or grep your servlet engine docs
or your own servlet source code. How do you handle errors?
I'm sill thinking this is an attempt to popup a server side dialog
box with no gui available to jvm but it is just a guess.

From the logs, I could see that the calling to createjpg servlet
happened successfully.

createjpg servlet produces the image as follows:
JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(out);
JPEGEncodeParam param =
encoder.getDefaultJPEGEncodeParam(bufferedImage);
param.setQuality(1.0f, false);
encoder.setJPEGEncodeParam(param);
encoder.encode(bufferedImage);

Hope this does not confuse you.

You should understand that calling a servlet from a client
IS NOT THE SAME as calling a servlet from the server.

A servlet is just a java class- you can't type that into a browser address bar
but web server can invoke an instance thereof when the right url comes up.
However, there is nothing to prevent you from calling one yourself
in other server side java code if you can load the class def somehow.

It works for the HTML, because the browser is calling
the servlet through HTTP on the internet/intranet.

Actually, the presence of NAT's can be confusing if you use the
same IP or even hostname and don't account for this.
Code making reqs from client may need a different host
than server, and server needn't know who 127.0.0.1 is
or know its own name ( ther could be many virtual hosts).

That doesn't mean your server permissions allow you to
call the servlet.

This is really not an iText problem.

It does seem to be using your time :)

I'm forwarding this to the mailing list so that others can confirm.
best regards,
Bruno

_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask

Re: [iText-questions] Itext doesn´t work with Barc ode128 Font

2010-02-07 Thread Mike Marchywka











 From: psoa...@glintt.com
 To: itext-questions@lists.sourceforge.net
 Date: Sun, 7 Feb 2010 12:59:38 +
 Subject: Re: [iText-questions] Itext doesn´t work with Barcode128 Font

 This was already explained before: the font has a flag that says that it's a
 symbolic font and symbolic fonts can only have 256 character. iText could
 ignore that flag but it would then fail with really symbolic fonts because
 symbolic fonts expect a particular encoding. The font may work in Word
 because the encoding problems in an interactive application are not as
 relevant as in a PDF. This situation may have to be addressed by iText,
 after all it already fixes broken fonts and PDFs, but I've no idea if/when
 that will happen.

At this point, it would be common to mention for no particular
purpose that  the source code is available and you can't
predict when any of the interested users will check-in a fix. LOL.
It sounds like you are just suggesting the code needs a //
somewhere- often I get open source stuff and make private hardcoded
modifications that would be of no use to anyone else but if
it is a matter of adding a method that may be easy and reusable.




 Paulo

 - Original Message -
 From: Claudia Murialdo 
 To: Post all your questions about iText here
 
 Sent: Sunday, February 07, 2010 12:24 AM
 Subject: Re: [iText-questions] Itext doesn´t work with Barcode128 Font


 1) Yes it is ok.
 2) But, Barcode 128 it just a font, isn't it?. So if I want to print
 only the character Š, It should be possible. Am I right?. Using itext,
 this character is not printed, however it is a valid character which
 is part of the table character of Barcode 128, i can see it in
 Character map utility of Window when I choose this font. This char,
 actually, a few characters, the last ones of the valid table character
 of Barcode128, are ignored when I use itext to print them.

 I tried the built barcode system of itext, generating images and the
 generated image is perfect for the original text
 (3309072963568700011355003017381600349594), but I need to do it
 using the font beacuse it is part of a generic program, and the
 program receives the coded text (they need to choose exactly what
 symbology to use, A, B, or C, they need that).

 Could you download the barcode I uploaded at
 http://www.usaupload.net/d/5qpza92olsd?. So you can see the problem.

 Thank you.
 Claudia.


 On Thu, Feb 4, 2010 at 4:45 PM, Mark Storer  wrote:
 1) Does your string contain the start/stop characters  checksum already?
 If not, you won't see them.

 2) Just because it's a valid string doesn't mean its a valid Barcode128
 string. Each symbology has its own requirements. The online barcode
 generator at http://www.morovia.com/free-online-barcode-generator/ didn't
 seem to like your input string. Sseveral missing character characters
 appear in the text below the bars, and there's no telling what they're
 represented as in the graphic portion.

 iText has its own built in barcode system, I suggest giving it a shot.

 --Mark Storer
 Senior Software Engineer
 Cardiff.com

 #include 
 typedef std::Disclaimer DisCard;



 -Original Message-
 From: Claudia Murialdo [mailto:cmuria...@gmail.com]
 Sent: Thursday, February 04, 2010 6:46 AM
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Itext doesn´t work with Barcode128 Font


 I'm using itext to generate a PDF document using a true type font for
 BarCode128. The problem is that the start and stop characters are not
 printed.
 The text is: ‰A)'=_Xwè!-Wèèè1F0èBÀ~UŠ It corresponds to the string
 3309072963568700011355003017381600349594 converted to Barcode 128.
 It is a valid string since I see it OK in Word and and browser and any
 several kind of editors.
 Why I cant see it ok the barcode generated using itext?.

 I uploaded the barcode 128 here http://www.usaupload.net/d/5qpza92olsd

 This is the code:

 Rectangle pageSize = new Rectangle(780, 525);
 Document document = new Document(pageSize);

 PdfWriter writer = PdfWriter.GetInstance(document,
 File.OpenWrite(Test.pdf));
 document.Open();

 PdfContentByte cb = writer.DirectContent;
 BaseFont bf =
 BaseFont.CreateFont(@C:\WINDOWS\Fonts\bcode128.ttf,
 BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

 cb.SetFontAndSize(bf, 50);
 cb.BeginText();
 cb.ShowTextAligned(Element.ALIGN_CENTER,
 ‰A)'=_Xwè!-Wèèè1F0èBÀ~UŠ, 200, 400, 0f);
 cb.EndText();
 document.Close();

 Regards,
 Claudia.


 --
 The Planet: dedicated and managed hosting, cloud storage, colocation
 Stay online with enterprise data centers and the best network in the business
 Choose flexible plans and management services without long-term contracts
 Personal 24x7 support from experience hosting pros just a phone call away.
 http://p.sf.net/sfu/theplanet-com
 ___
 iText-questions mailing list

Re: [iText-questions] Does not get an image from a servelt

2010-02-05 Thread Mike Marchywka












 Date: Fri, 5 Feb 2010 18:11:33 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Does not get an image from a servelt

 Donald Jayawardena wrote:
 Mike Marchywka  hotmail.com writes:

 The statement
 com.lowagie.text.Image iRiskNo = com.lowagie.text.Image.getInstance
 (http://localhost:8080/createjpg/createjpg;);
 gives an error as follows:

 SEVERE: owner frame is a java.awt.Frame

 The creatrejpg servlet returns an image of awt frame.
 Does this means that com.lowagie.text.Image.getInstance() can not handle awt
 frames?

 ???
 What do you mean by this question? The more mails you send the less
 people understand what you're talking about. If you want an answer,
 you'll have to stop confusing us, and start giving us information about
 the problem that makes sense.

Having not looked at the source or your error handling approach,
I finally decided this could be something you emit ( for lack of
a more precise word) from some where in itext to wherever
the OP can paste text. I take it you are not aware of any such possiblity.
Maybe you could just say that much or explain where this error comes from itext 
:)

The consensus seems to be, judging from a few comment and much silience,
no one knows where this message comes from but the servlet engine may 
be a reasonable candidate. I had earlier speculated that someone could
have turned an IO exception into an attempt to popup a dialog box
in the server  JVM with no gui. IIRC, it was claimed that this worked
in standalone or some other test environment. OP may have created error
handler based on immediate human feedback- chernobyl effect I mentioned
earlier was the process of turning a simple problem into a much larger one
through a series of questionable responses to each exceptional event. 


 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

 --
 The Planet: dedicated and managed hosting, cloud storage, colocation
 Stay online with enterprise data centers and the best network in the business
 Choose flexible plans and management services without long-term contracts
 Personal 24x7 support from experience hosting pros just a phone call away.
 http://p.sf.net/sfu/theplanet-com
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Does not get an image from a servelt

2010-02-04 Thread Mike Marchywka











 Date: Thu, 4 Feb 2010 08:01:36 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Does not get an image from a servelt

 Donald Jayawardena wrote:
 Thanks for your solution. Once the main servlet(riskmap) called image servlet
 (createjpg), createjpg writes to a log file after it generating the image. I
 can see riskmap call to createjpg is working fine. But the virtual image does
 not come to the parent servlet(riskmap). The parent (riskmap) servlet does 
 not
 give any error. How can I trap the stack trace without it returing any
 exception?

 You can't.
 The main question is: can the parent servlet(riskmap) be rewritten
 so that iText is not involved? Can it retrieve the byte[] with the
 image? I guess not, because iText doesn't do anything special.
 It just creates an URL object and calls openStream() to get the

I guess this always creates problems when designing an API. You expose
something like public Widget makeWidget(HighRiskErrorProneThing x) throws 
Throwable
or do you trap exceptions internally and return null and let call sort through
the parameter in steps? 

But, in any case, breaking it up into higher-rish and lower-risk steps
would be helpful. Any IO or user interaction is high-risk as anything can 
happen.
Presumably once you have your bytes itext will behave predictably just based on
the validity of the byte array. If you get back null or an exception you can
pass byte array around and do various checks.

btw, was that thing trying to popup a dialog box on the server?






 bytes of the Image... Although it does this multiple times; maybe
 you should first get the image bytes and then feed them to iText.

 Nevertheless: the error you're mentioning is not related to the problem.
 That's very strange.
 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

 --
 The Planet: dedicated and managed hosting, cloud storage, colocation
 Stay online with enterprise data centers and the best network in the business
 Choose flexible plans and management services without long-term contracts
 Personal 24x7 support from experience hosting pros just a phone call away.
 http://p.sf.net/sfu/theplanet-com
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] get image into pdf

2010-02-03 Thread Mike Marchywka

 Date: Wed, 3 Feb 2010 09:12:10 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] get image into pdf

 Donald Jayawardena wrote:

 Hi,

 I bought iText in Action but could not find the answer there.

 And the answer won't be in the second edition either,
 because getting an image in a Servlet is not an iText
 related problem.

LOL I can't remember why exactly but many years ago this was a recurring
issue on a servlet list (  how do I do some complicated thing unrelated to
servlets... using a servlet?). 

 My problem is pdf documnet is trying to receive an image from a servlet,
 but it receives null.

 Can you get it working in a standalone example?

 The netBeans (with Tomcat) gives the following error:
 *SEVERE: -- returning Frame NULL*
 *SEVERE: BaseDialog: owner frame is a java.awt.Frame*

 BaseDialog? Frame? Definitely not an iText problem.

I'm not even sure what this has to do with a servlet. Is your
servlet supposed to popup a dialog box? 

 When I run the servlet alone, it produces an image.

What does this mean? you run it locally of call it from a 
standalone app? 

 So you can deploy the servlet and it works when you use:

 The statements I use:
 Image iRiskNo = null;
 iRiskNo = Image.getInstance(new
 URL(http://localhost:8080/createjpg/createjpg;));

I don't know what all you are doing or where you are testing
but is localhost changing at any time? 

 Then it's definitely not an iText problem.

 Please advise me what to do.

 If it works when you deploy it on Tomcat and use it with a
 normal browser, but it doesn't work when you deploy it in
 NetBeans, then the problem is a NetBeans problem. (Some
 frame or dialog that can't be accessed?)

Servlets aren't supposed to really do this ( at least not many years
ago ). They are suppoed to live during the lifetime of a connection to some 
thing
and not in themselves interact with humans. 

_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Does not get an image from a servelt

2010-02-03 Thread Mike Marchywka




I thnk it would help if you could get a stack trace from an exception
that involves itext. User some toher utility to verify that the url
you have is valid etc. Develop some debug strategy for
your servlets that catches everything and logs to a file somewhere.
I think when you posted this earlier no one knew what to make
of your error report.


 For the pdf page in riskmap servlet,
 I have used getInstance call as below:

 Image iRiskNo = null;

 iRiskNo = 
 com.lowagie.text.Image.getInstance(http://localhost:8080/createjpg/createjpg;);



 When it runs the above statement, it
 gives the following error

 SEVERE: -- returning Frame
 NULL

 SEVERE: BaseDialog: owner frame is
 a java.awt.Frame

Who is it that you mention and what is a BaseDialog? 
That call appears to return an Image, not something that obviously
relates to your text above even from Image.toString(). 
Is Lowagie  saying this is severe? If your call has to throw an IOException
or something, what does your code do? The error you are printing could be 
anything-
someone could be trying to put up a dialog box and of course the servlet 
probably
isn't attached to a gui, the severity being due to the chernobyl effect acting 
on
a common exception. Can you put a try/catch around the code and dump the stack 
somewhere
like a log file? I forget the normal error handling appraoches for servlets but 
popping
up a dialog isn't the first thought I would have.






 In creatjpg servlet, the statements
 that generate the image are as follows:


 created a bufferedimage and ...

 ...

 JPEGEncodeParam
 eP = JPEGCodec.getDefaultJPEGEncodeParam(bufferedImage);

 eP.setQuality(1.0f,
 true);

 JPEGImageEncoder
 encoder = JPEGCodec.createJPEGEncoder(out);

 encoder.encode(bufferedImage,
 eP);



Did you try to hit the url you expect to contain your image using something 
like wget
and verify the server is returning a valid image file with any headers that may 
be relevant?


 I imported the following classes in
 createjpg servlet:

 import java.awt.*;

 import java.awt.geom.*;

 import java.awt.image.*;

 import java.awt.Color;

 import java.awt.Font;

 import java.io.*;

 import java.io.IOException;

 import com.sun.image.codec.jpeg.JPEGEncodeParam;

 import com.sun.image.codec.jpeg.JPEGImageEncoder;

 import javax.servlet.http.*;

 import javax.servlet.*;

 import com.sun.image.codec.jpeg.JPEGCodec;



Are you worried about name conflicts? What does this tell us?




 As I mentioned above, the riskmap servlet
 (pdf objects) does not receive images from createjpg servlet.

 Can someone please let me know how I
 can troubleshoot/solve this problem?.

Go to sun.com and find debugging strategies for servlets and come back here 
with and stack traces
that mention lowagie or itext. 





  
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Chinese font in linux could not displayed.

2010-01-29 Thread Mike Marchywka

 Date: Fri, 29 Jan 2010 13:23:34 +0800
 From:
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Chinese font in linux could not displayed.

 Hi, all:

 I use itext to export the swt chart to a pdf file. and there are some Chinese 
 characters, but all the Chinese characters are displayed as blank.

 and I have used itextasian.jar in my project. Now I can export the pdf file 
 with Chinese corrently in Windows, but when it runs on linux(ubuntu 9.10), 
 Chinese characters only display as blank.

 Any reply is welcome.

You are getting the same silent failure mode in first and last case? Generally 
porting java is quite simple unless
you rely on some native support ( jni and dll for example). However, usually 
jars end up in the wrong place
or with missing permissions. Is there any way to get more diagnostics out so 
someone can complain
about font not found for example? I guess if you are sure the jar is in the 
right place and there aren't any
java version problems ( type `which java` and make sure it picksu up something 
recent from Sun) 
try to chmod 755 so everyone can execute it. 

 This is some of my code:

 BufferedOutputStream out = new BufferedOutputStream(new 
 FileOutputStream(fileName));

 // convert chart to PDF with iText:
 Rectangle pagesize = new Rectangle(width, height);
 Document document = new Document(pagesize, 50, 50, 50, 50);
 try {
 PdfWriter writer = PdfWriter.getInstance(document, out);
 document.addAuthor(popjxc); //$NON-NLS-1$
 document.open();

 PdfContentByte cb = writer.getDirectContent();
 PdfTemplate tp = cb.createTemplate(width, height);
 Graphics2D g2 = tp.createGraphics(width, height,
 new AsianFontMapper(STSongStd-Light, // support Chinese CharSet
 UniGB-UCS2-H));

 Rectangle2D r2D = new Rectangle2D.Double(0, 0, width,
 height);
 piechart.draw(g2, r2D, null);
 g2.dispose();
 cb.addTemplate(tp, 0, 0);

 document.newPage();

 tp = cb.createTemplate(width, height);
 g2 = tp.createGraphics(width, height,
 new AsianFontMapper(STSongStd-Light,
 UniGB-UCS2-H));

 r2D = new Rectangle2D.Double(0, 0, width,
 height);
 barchart.draw(g2, r2D, null);
 g2.dispose();
 cb.addTemplate(tp, 0, 0);
 } finally {
 document.close();
 }

_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/196390709/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Chinese font in linux could not displayed.

2010-01-29 Thread Mike Marchywka










 Date: Fri, 29 Jan 2010 13:04:39 +0100
 From:
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Chinese font in linux could not displayed.

 Mike Marchywka wrote:
 You are getting the same silent failure mode in first and last case?

 If I understand correctly, whether or not Java is used is irrelevant.
 It's about the PDF. CJK fonts are never embedded, and it's perfectly
 normal that the glyphs don't show up if the font isn't available.

I think the OP claimed adding a jar file fixed the behaviour
in the one case.
Doing nothing is fine for normal usage but on either end,
authoring or displaying, you really need to have tools that 
can accept a -verbose option to explain what they are doing 
( or at least provide source code so you can see for yourself).



 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

 --
 The Planet: dedicated and managed hosting, cloud storage, colocation
 Stay online with enterprise data centers and the best network in the business
 Choose flexible plans and management services without long-term contracts
 Personal 24x7 support from experience hosting pros just a phone call away.
 http://p.sf.net/sfu/theplanet-com
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/196390708/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Extract jpeg image color problem

2010-01-27 Thread Mike Marchywka

Since no on else replied, 

 Date: Wed, 27 Jan 2010 12:58:57 +0200
 From: 
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] Extract jpeg image color problem

 Hi

 I can successfully extract a jpeg image from a PDF document, but the color is 
 all messed up.

Did you do this with itext? In any case can you post some code? 

 Any help would be appreciated

It depends what you mean by messed up. I'll assume this is not a well known 
issue so some
details may help. In particular, is color map shifted through entire image ( 
r-g for example)
or does it change on each line? I've seen this a lot with various image formats 
and lines
with non-mod-N length since padding specs are often ambiguous and lower level
code may just do whatever machine does. Is this a 64 bit machine for example?

I think now that you mention it I may have seen rendered pages from the open 
source viewer
I used have color shifts (uniform color table change that looks like an 
off-by-one RGB alignment issue).
I'm used to seeing this from various sources and wasn't important at the time 
so I didn't
track it down. I have noted that different image viewers can display the same 
jpg ( presumably it is not 
quite right but still a jpg LOL) differently
too, have you tried different viewers or examined the jpg bytes to see what it 
should look like?

 Thanks

_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/196390709/direct/01/
--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] PDFTable issue

2010-01-25 Thread Mike Marchywka












 Date: Mon, 25 Jan 2010 12:39:09 +0100

 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] PDFTable issue








 Hi ...

 I have a problem with PDFTable under java. When I try to create a cell
 with colspan it works. Same with rowspan. But if I try to enable both
 of them to one cell java throes a NullPointExeption. For example I cant
 create a table like this:

Do you have a stack trace or something? You may get lucky and someone
will know of a common issue but if you could post the details of the exception
someone who happens to be browsing the source code may be able
to help you too.




 If there is any solution pls write it to me! Tnx for help!

 Adam Sandor
  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/196390709/direct/01/
--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Reg : pdf to jpeg conversion

2010-01-22 Thread Mike Marchywka




 Date: Fri, 22 Jan 2010 14:33:24 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Reg : pdf to jpeg conversion

 Murali Jillella wrote:
 Is it possible to get the contents of the image into a byte array?

 No, when you create an Image this way, you're creating a Form XObject,
 not an Image XObject. Big difference!

 I
 have noticed that jpeg.getOriginalData() returns null. Why?

 Because there is no original data; only PDF syntax.
 iText doesn't do PDF to JPG conversions.

I would suggest an open source renderer but in response to your question
I started looking around sun.com and maybe there are some simple
things that work with JMF or JAI for this, I couldn't
tell as these were largely forum posts and then 
I decided it wasn't that big a deal to me but you may find
something simple over there, not sure they keep adding
stuff to java.



 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

 --
 Throughout its 18-year history, RSA Conference consistently attracts the
 world's best and brightest in the field, creating opportunities for Conference
 attendees to learn about information security's most important issues through
 interactions with peers, luminaries and emerging and established companies.
 http://p.sf.net/sfu/rsaconf-dev2dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/196390708/direct/01/
--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Number of page after merging pdf file

2010-01-19 Thread Mike Marchywka











 Date: Tue, 19 Jan 2010 18:21:05 +0100
 From: 
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Number of page after merging pdf file

 Degla Degla wrote:
 i'm using a method to merge pdf document , i dont find a method to generate
 a good number of pages on footer, i explain : when i merge pdf files, in my
 doc i have original page number and not a good numerotation (from 1 to n
 but i have many 1)

 Your PDF is like a book.
 Merging is like taking photocopies.
 You can't remove the original page number.
 That's inherent to PDF, didn't you know?

I thought Leonard had many comments about preserving the logical structure
of documents. Isn't there any standard way in which a pdf authoring tool
could tell everyone else 'this character here is a page number with value foo?
Presumably such a facility would let someone manipulating the document 
manipulate
identifiable things.
If you were set of preserving logic, structure, and information coherence
while producing a cute picture that the boss likes, what options
would you have?  Does the itext book discuss this at all? It could
save a lot of people from dead ends.


Thanks.


 You could add new page numbers, though.
 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

 --
 Throughout its 18-year history, RSA Conference consistently attracts the
 world's best and brightest in the field, creating opportunities for Conference
 attendees to learn about information security's most important issues through
 interactions with peers, luminaries and emerging and established companies.
 http://p.sf.net/sfu/rsaconf-dev2dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/196390709/direct/01/
--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Get table from PDF IText

2010-01-16 Thread Mike Marchywka

 Date: Sat, 16 Jan 2010 11:04:12 +0100
 From: i...@1t3xt.info
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Get table from PDF IText

 aro1982 wrote:
 Is there any solution to get table (structure, cells content) from existing
 PDF? I've created some PDF with PdfPTable and I want to get this table from
 file. I've tried it in many ways but it is very difficult to do and I can't
 find any examples which can help me.

 That's because you're trying something that is impossible.

Are there qualifications or alternative appraoches here? That is, Leonard has 
sometimes
offered that it is possible to preserve the logical structure in a document
so that people who want to use computers to automate data processing
instead of just look at pictures can, with varying amounts of effort, do so 
with a pdf file.
If an ambitious pdf author wanted to allow a user to extract a csv file
equivalent to his table, without all the formatting junk and just the data,
how may he go about designing the document ?  Data generally gomes into
forms either from manual entry ( typing ) or some other source in a
character format not garbled into pixels using an arbitrary font. It
would be nice in many cases to preserve this information.

Thanks.

 --
 This answer is provided by 1T3XT BVBA
 http://www.1t3xt.com/ - http://www.1t3xt.info

 --
 Throughout its 18-year history, RSA Conference consistently attracts the
 world's best and brightest in the field, creating opportunities for Conference
 attendees to learn about information security's most important issues through
 interactions with peers, luminaries and emerging and established companies.
 http://p.sf.net/sfu/rsaconf-dev2dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/196390709/direct/01/
--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

1 2 3 >

1 - 100 of 223 matches

Mail list logo