Re: [iText-questions] Clarification on PdfCopy.freeReader()
ok - that's what it looked like. So nothing bad would happen, we'd just wind up with resource content streams getting added multiple times. I think that PdfSmartCopy mostly addresses the downsides of that... Thanks, - K Paulo Soares-3 wrote: > > freeReader() makes the writer instance forget about that particular pdf. > You may open the same pdf again but it will be like a different pdf and > won't use any of the shared resources from the first one that you could > use if the pdf was not freed. freeReader() is meant to be used when you're > done with the doc, not if you intend to use it later. > > Paulo > > > From: 'Kevin Day' [ke...@trumpetinc.com] > Sent: Thursday, April 29, 2010 12:55 AM > To: IText Questions > Subject: [iText-questions] Clarification on PdfCopy.freeReader() > > I'm trying to get a handle on the implications of using the freeReader() > call of PdfCopy. If I call this, can I not add more pages from the > 'freed' reader to the same PdfCopy instance? Or is it safe to call: > > pdfCopy.freeReader(myReader); > PdfImportedPage imported = pdfCopy.getImportedPage(myReader, 1); > pdfCopy.addPage(imported); > pdfCopy.freeReader(myReader); > PdfImportedPage imported = pdfCopy.getImportedPage(myReader, 2); > pdfCopy.addPage(imported); > > > Obviously, in a real app, these calls would not be in the same block of > code. > > > This is kind of a long way of asking if it wouldn't be better to just > implicitly call freeReader() in addPage() if the reader is different from > the currentReader. I suspect that this would have performance > implications for readers opened in partial mode, but I wanted to check my > thinking. > > Thanks, > > - K > > > > > > > Aviso Legal: > Esta mensagem é destinada exclusivamente ao destinatário. Pode conter > informação confidencial ou legalmente protegida. A incorrecta transmissão > desta mensagem não significa a perca de confidencialidade. Se esta > mensagem for recebida por engano, por favor envie-a de volta para o > remetente e apague-a do seu sistema de imediato. É proibido a qualquer > pessoa que não o destinatário de usar, revelar ou distribuir qualquer > parte desta mensagem. > > Disclaimer: > This message is destined exclusively to the intended receiver. It may > contain confidential or legally protected information. The incorrect > transmission of this message does not mean the loss of its > confidentiality. If this message is received by mistake, please send it > back to the sender and delete it from your system immediately. It is > forbidden to any person who is not the intended receiver to use, > distribute or copy any part of this message. > > > -- > > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > -- View this message in context: http://old.nabble.com/Clarification-on-PdfCopy.freeReader%28%29-tp28395359p28403248.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Performance when flattening form fields
Mike - can we please reserve this thread for a technical discussion of the merits of the proposal? I'd be happy to have a conversation in a separate thread regarding how iText works. - K Mike Marchywka-2 wrote: > > > > > > > > > > > > > >> Date: Sun, 25 Apr 2010 22:14:02 -0700 >> From: forum_...@trumpetinc.com >> To: itext-questions@lists.sourceforge.net >> Subject: Re: [iText-questions] Performance when flattening form fields >> >> >> After more digging, I'm wondering if the place to do this wouldn't be in >> the >> PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do >> the same flattening operation that PdfStamper does. >> >> The ideal would be to factor out the behavior so the code isn't >> duplicated >> in both PdfCopy and PdfStamper... > > I guess I have the larger question of exactly what parsing is? > That is, it seem generally you use itext to 1) read in somthing, often > an existing pdf, 2) do some stuff, then 3) write out a pdf. Presumably > as you go through step 2, you are assembling or compiling a bunch > of structures that allow you to do step 3 but are more optimized > for manipulation and editing the nascent PDF. > If I understand your earlier comments, you apparently don't actually > have a generic PDF parser to do step 1 that works with all sequences > you could put into step 2. Now, of course, more generally the > above approach doesn't scale as you would always hope to stream > to some extent- read what you need, write what you can etc. > However, that could probably be hidden somewhat into the implementation > for classes for each step. > So, instead of things like PdfCoolFeature.doSomething(byte[] pdffile) > you have PdfCoolFeadture.doSomething( ParsedPdfOperand pdflikething) > where the second signature take a parameter that is generally > optimized for a broad class of common operations. > >> >> Does anyone see any technical issues with this as a strategy? >> >> - K >> >> >> 'Kevin Day' wrote: >>> >>> >>> >>> >>> >>> >>> >>> >>> I've been doing some digging into the performance question that Giovanni >>> Azua has posted about. >>> >>> Some of his findings (using StringBuilder, etc...) are solid >>> improvements >>> to overall iText performance - however, the crux of the performance >>> difference he is seeing between iText and the competing solution is not >>> low level. It's a high level issue. >>> >>> Here's what's going on: >>> >>> His specific use case involves stamping headers and footers onto >>> pages. The footer contains AcroFields that must be flattened prior >>> to stamping. >>> >>> The performance hit is coming from the fact that, in order to flatten >>> and >>> apply the footer, he is having to: >>> >>> 1. Construct a PDF using PdfStamper >>> 2. Write output to a byte array output stream >>> 3. Re-parse the BAOS into a PdfReader >>> 4. Import the page from the reader for use as a stamp >>> >>> While this is functional, it is certainly not performant. >>> >>> A much, much faster technique would be to do the flattening to the >>> *reader*, then just import the page to the output writer. This >>> avoids the awkward creation of the temporary PdfReader. >>> >>> >>> So, the performance delta is not caused so much by iText's low level >>> implementation (although the performance improvements that Giovanni has >>> suggested will help to make iText even faster than it already is) - the >>> delta is really caused by an awkward operation forced on the user by the >>> framework. >>> >>> >>> So, are there any fundamental reasons to not do flattening, etc... to >>> the >>> PdfReader? My first look at the code indicates that it may be >>> possible to factor this out of PdfStamper (basically, instead of >>> adjusting >>> the AcroFields dictionary and content streams in the >>> PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the >>> PdfReader). >>> >>> I'm thinking of something along the lines of: >>> >>> PdfFormFlattener(PdfReader).flatten(pageNumber) >>> >>> Maybe with supplemental methods for flattenNamedFields(pageNumber), >>> flattenFieldsOfType(pageNumber) >>> >>> Thoughts? >>> >>> - K >>> >>> >>> >>> >>> -- >>> >>> ___ >>> iText-questions mailing list >>> iText-questions@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>> Buy the iText book: http://www.itextpdf.com/book/ >>> Check the site with examples before you ask questions: >>> http://www.1t3xt.info/examples/ >>> You can also search the keywords list: >>> http://1t3xt.info/tutorials/keywords/ >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html >> Sent from the iText - General mailing list archive at Nabble.com. >> >> >>
Re: [iText-questions] Performance when flattening form fields
After more digging, I'm wondering if the place to do this wouldn't be in the PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do the same flattening operation that PdfStamper does. The ideal would be to factor out the behavior so the code isn't duplicated in both PdfCopy and PdfStamper... Does anyone see any technical issues with this as a strategy? - K 'Kevin Day' wrote: > > > > > > > > > I've been doing some digging into the performance question that Giovanni > Azua has posted about. > > Some of his findings (using StringBuilder, etc...) are solid improvements > to overall iText performance - however, the crux of the performance > difference he is seeing between iText and the competing solution is not > low level. It's a high level issue. > > Here's what's going on: > > His specific use case involves stamping headers and footers onto > pages. The footer contains AcroFields that must be flattened prior > to stamping. > > The performance hit is coming from the fact that, in order to flatten and > apply the footer, he is having to: > > 1. Construct a PDF using PdfStamper > 2. Write output to a byte array output stream > 3. Re-parse the BAOS into a PdfReader > 4. Import the page from the reader for use as a stamp > > While this is functional, it is certainly not performant. > > A much, much faster technique would be to do the flattening to the > *reader*, then just import the page to the output writer. This > avoids the awkward creation of the temporary PdfReader. > > > So, the performance delta is not caused so much by iText's low level > implementation (although the performance improvements that Giovanni has > suggested will help to make iText even faster than it already is) - the > delta is really caused by an awkward operation forced on the user by the > framework. > > > So, are there any fundamental reasons to not do flattening, etc... to the > PdfReader? My first look at the code indicates that it may be > possible to factor this out of PdfStamper (basically, instead of adjusting > the AcroFields dictionary and content streams in the > PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the > PdfReader). > > I'm thinking of something along the lines of: > > PdfFormFlattener(PdfReader).flatten(pageNumber) > > Maybe with supplemental methods for flattenNamedFields(pageNumber), > flattenFieldsOfType(pageNumber) > > Thoughts? > > - K > > > > > -- > > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > -- View this message in context: http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
If the file is being entirely pre-loaded, then I doubt that IO blocking is a significant contributing factor to your test. I think that the best clue here may be the difference between performance with form flattening and without form flattening. Just to confirm, am I right in saying that iText outperforms the competitor by a significant amount in the non-flattening scenario? If that's the case, then it seems like we should see significant differences in the profiling results between the flattening and non-flattening scenarios in iText. Would you be willing to post the profiling results for both cases so we can see which code paths are consuming the most runtime in each? Another possibility if the profiling results show similar hotspots is that the form flattening algorithms in iText are using the hotspot areas a lot more than in the non-flattening case. There may be a bunch of redundant reads or something in the flattening case. Let's take a look at the profiling results and see if we can draw any conclusions about where to go next. BTW - which profiler are you using? Are you able to expand each of the hotspot code paths and see the actual call path that is causing the bottleneck? I use jvvm, and the results of expanding the hotspot call trees can be quite illuminating. What I really would like is to get ahold of your two benchmark tests (with and without flattening) so I can run it on my system - do you have anything you can package up and share? - K Giovanni Azua-2 wrote: > > Hello, > > On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: >> Don't know if it'll make any difference, but the way you are reading the >> file >> is horribly inefficient. If the code you wrote is part of your test >> times, >> you might want to re-try, but using this instead (I'm just tossing this >> together - there might be type-os): >> >> ByteArrayOutputStream baos = new ByteArrayOutputStream(); >> byte[] buf = new byte[8092]; >> int n; >> while ((n = is.read(buf)) >= 0) { >> baos.write(buf, 0, n); >> } >> return baos.toByteArray(); >> > I tried your suggestion above and made no significative difference > compared to doing the loading from iText. The fastest I could get my use > case to work using this pre-loading concept was by loading the whole file > in one shot using the code below. > > Applying the cumulative patch plus preloading the whole PDF using the code > below, my original test-case now performs 7.74% faster than before, > roughly 22% away from competitor now ... > > btw the average response time numbers I was getting: > > - average response time of 77ms original unchanged test-case from the > office multi-processor-multi-core workstation > - average response time of 15ms original unchanged test-case from home > using my MBP > > I attribute the huge difference between those two similar experiments > mainly to having an SSD drive in my MBP ... the top Host spots reported > from the profiler are related one way or another to IO so would be no > wonder that with an SSD drive the response time improves by a factor of > 5x. There are other differences though e.g. OS, JVM version. > > Best regards, > Giovanni > > private static byte[] file2ByteArray(String filePath) throws Exception { > InputStream input = null; > try { > File file = new File(filePath); > input = new BufferedInputStream(new FileInputStream(filePath)); > > byte[] buff = new byte[(int) file.length()]; > input.read(buff); > > return buff; > } > finally { > if (input != null) { > input.close(); > } > } > } > > > > -- > > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28352147.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
I'd love to discuss specific ideas on prediction - are you familiar enough with the PDF spec to provide any suggestions? Some obvious ones are the xref table - but iText reads that entirely into memory one time and holds onto it, so it seems unlikely that pre-fetch would do much there (other than having the last 1MB of the file be the first block pre-fetched - but any sort of paging implementation would handle that already). The rest... well, from my experience with this, you've got objects that refer to other objects that refer to other objects. And there's really no way to know where in the object graph you need to go until you parse and then go there. So I think I'll need some concrete examples of how this might be done with PDF structure - just to get my creativity going! - K Mike Marchywka-2 wrote: > > > >> >> >> Parsing PDF requires a lot of random access. It tends to be chunked - >> move >> to a particular offset in the file, then parse as a stream (this is why >> paging makes sense, and why memory mapping is effective until the file >> gets > > Yes, that is great but instead of a generic MRU approach are > there better predictions you can make, even start loaing pages > before having to wait later etc? Maybe multithreading makes > sense here. > > > >> too big). But the parsing is incredibly complex. You can have nested >> object structures, lots of alternative representations for the same type >> of >> data, etc... > > surely there are rules and I'm sure this topic has been beaten > to death in many CS courses ( as have stats LOL). Profiling > should point to some suspects. Algorithmic optimizations may > be possible as maybe just coding changes. Most compilers > operate sequentially on input in maybe multiple passes I'm > sure you can find ideas easily in a vraiety of sources. > > >> >> And we definitely don't know size of any of these structures ahead of >> time. > > well, you don;t need to know if a week ahead of time, but > you could maybe waste an access or two finding sizes if that > can be done more quickly than just reading everything. > > > _ > Hotmail is redefining busy with tools for the New Busy. Get more from your > inbox. > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 > -- > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28346601.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] AW ESOME! performance follow up
Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf)) >= 0) { baos.write(buf, 0, n); } return baos.toByteArray(); If loading the file into main memory makes any difference, that difference will be a measure of the impact of virtual<->native interface interaction. In effect, this is telling us whether the calls to file.read() should be replaced with file.read(byte[]). >From your results, are you seeing a big difference between iText and the competitor when you aren't flattening fields vs you are flattening fields? Your profiling results aren't indicating bottlenecks in that area of the code. If iText is much faster than the competitor in the non-flattening scenario, but slower than the competitor in the flattening scenario, I'm having a hard time reconciling the data presented so far. Giovanni Azua-2 wrote: > > > I am sooo sorry the performance is worse with the change for pre-loading > the PDFs in the test-case :(( the problem was that I ran the > benchmarks with a small mistake in my test case ... > > Loading the HEADER demonstrates how to load flattened pre-formatted PDF > part templates ... > > Loading the FOOTER demonstrates how to load PDF part templates containing > fields that need to be populated. > > The mistake was to leave fixed the HEADER always ... so it would load only > the flattened PDF template and not the footer (see below) [sigh] In any > case is good to know that loading flattened PDF parts is cheaper. > > I mistakenly ran the last benchmark like this: > > private static byte[] file2ByteArray(String filePath) throws Exception { > InputStream input = null; > ByteArrayOutputStream output = null; > try { > input = new BufferedInputStream(new FileInputStream(HEADER_PATH)); > output = new ByteArrayOutputStream(); > int data = input.read(); > while (data != -1) { > output.write(data); > > data = input.read(); > } > > return output.toByteArray(); > } > finally { > if (input != null) { > input.close(); > } > > if (output != null) { > output.close(); > } > } > } > -- > > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28346146.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
One thing occurs to me on the IO performance... If we are using a memory mapped file, the backing buffer is, by definition, on the native side of the virtual/native boundary. So readying one byte at a time requires a lot of round trip across that boundary. Even with memory mapping, it may make sense to do some sort of paging... I'll have to think on that a bit. - K Giovanni Azua-2 wrote: > > Hello trumpetinc, > > On Apr 23, 2010, at 7:29 PM, trumpetinc wrote: > >> Giovanni - if your source PDFs are small enough, you might want to try >> this, >> just to get a feel for the impact that IO blocking is having on your >> results >> (read entire PDF into byte[] and use PdfReader(byte[])) >> > Trying it right now ... > >> The StringBuffer could definitely be replaced with a StringBuilder, and >> it >> could be re-used instead of re-allocating for each call to nextTokeen() >> > This is what I applied yesterday with the patch I posted. It includes both > changes in PRTokeniser: StringBuilder + reusing the same instances ... the > improvement is somewhere around 6.2% faster for my test case. > > I want to try this one you suggest above ... and then I will post the new > numbers plus the cumulative patch I have ... > > Best regards, > Giovanni > -- > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28345102.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Parsing PDF requires a lot of random access. It tends to be chunked - move to a particular offset in the file, then parse as a stream (this is why paging makes sense, and why memory mapping is effective until the file gets too big). But the parsing is incredibly complex. You can have nested object structures, lots of alternative representations for the same type of data, etc... And we definitely don't know size of any of these structures ahead of time. hmmm - just had a thought on IO performance. I'll post that in a separate message so we can keep the technical discussion separate. - K Mike Marchywka-2 wrote: > > > You can have alt implementations in the mean time if you know > size a priori. Ideally you would > like to be able to operate on a stream and scrap random access. > > > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28345099.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Nah - I'm not saying that memory is cheap (or that cache misses aren't important) - just saying that int -> char casting isn't the culprit here. The parser is a really low level algorithm that is responsible for reading int from the bytes of a file and figuring out the appropriate value to convert them to. Sometimes it's a char, sometimes not. By the time the results of the parse are pulled from the parser, they are not ints anymore. It's not like we are carrying around a massive block of int[] to represent a string or anything like that. The profiling results from Giovanni indicate that the call to isWhitespace accounts for less than 1% of runtime, while the calls to RandomAccessForOrArray.read() (and it's Mapped IO derivatives) and PRTokeniser.nextToken() consume 17% combined. It's probably best to focus on those. Mike Marchywka-2 wrote: > > > > > > So you are doing everything internally with 32 bit "chars?" > Not a big deal but if these are mostly zero there may be > better ways to represent and save memory. You may say, "well > RAM is cheap" but that doesn't matter since low level caches > are fixed but I guess you can get a bigger disk and say VM is unlimited. > > > > >> are actually consuming run time, and this method isn't one of them (no >> matter how much it could be optimized). > > The only person with data claimed otherwise :) > > > >> >> > tp://1t3xt.info/tutorials/keywords/ > > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28344478.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
This tells us that the focus needs to be on PRTokeniser and RAFOA. For what it's worth, these have also shown up as the bottlenecks in profiling I've done in the parser package (not too surprising). I'll discuss each in turn: RandomAccessFileOrArray - I've done a lot of thinking on this one in the past. This is an interesting class that can have massive impact on performance, depending on how you use it. For example, if you load your source PDF entirely into memory, and pass the bytes into RAFOA, it will remove IO bottlenecks. Giovanni - if your source PDFs are small enough, you might want to try this, just to get a feel for the impact that IO blocking is having on your results (read entire PDF into byte[] and use PdfReader(byte[])) The next thing that I looked at was buffering (a naive use of RandomAccessFile is horrible for performance, and significant gains can be had by implementing a paging strategy). I actually implemented a paging RandomAccessFile and started work on rolling it into RAFOA last year, but my benchmarks showed that the memory mapped strategy that RAFOA uses had equivalent performance to the paging RAF implementation. These tests weren't conclusive, so there may still be some things to learn in this area. The one problem with the memory mapped strategy (in it's current implementation) is that really big source PDFs still can't be loaded into memory. This could be addressed by using a paging strategy on the mapped portion of the file - probably keep 10 or 15 mapped regions in an MRU cache (maybe 1MB in size each). For reference, the ugly (really ugly) hack that determines whether RAFOA will use memory mapped IO is the Document.plainRandomAccess static public variable (shudder). So what about the code paths in PRTokeniser.nextToken()? We've got a number of tight loops reading individual characters from the RAFOA. If the backing source isn't buffered, this would be a problem, but I don't know that is really the issue here (it would be worth measuring though...) The StringBuffer could definitely be replaced with a StringBuilder, and it could be re-used instead of re-allocating for each call to nextTokeen() (this would probably help quite a bit, as I'll bet the default size of the backing buffer has to keep growing during dictionary reads). Another thing that could make a difference is ordering of the case and if's - for example, the default: branch turns around and does a check for (ch == '-' || ch == '+' || ch == '.' || (ch >= '0' && ch <= '9'). Changing this to be: case '-': case '+': case '.': case '0': ... case '9': May be better. The loops that check for while (ch != -1 && ((ch >= '0' && ch <= '9') || ch == '.')) could also probably be optimized by removing the && ch != -1 check - the other conditions ensure that the loop will escape if ch==-1 It might be interesting to break the top level parsing branches into separate functions so the profiler tell us which of these main branches is consuming the bulk of the run time. Those are the obvious low hanging fruit that I see. Final point: I've seen some comments suggesting inlining of some code. Modern VMs are quite good at doing this sort of inlining automatically - a test would be advisable before worrying about it too much. Having things split out actually makes it easier to use a profiler to determine where the bottleneck is. One thing that is quite clear here is that we need to have some sort of benchmark that we can use for evaluation - for example, if I had a good benchmark test, I would have just tried the ideas above to see how they fared. - K Giovanni Azua-2 wrote: > > > On Apr 22, 2010, at 11:18 PM, trumpetinc wrote: >> >> I like your approach! A simple if (ch > 32) return false; at the very >> top >> would give the most bang for the least effort (if you do go the bitmask >> route, be sure to include unit tests!). > > > Doing this change spares approximately two seconds out of the full > workload so now shows 8s instead of 10s and isWhitespace stays at 1%. > > The numbers below include two extra changes: the one from trumpetinc above > and migrating all StringBuffer references to use instead StringBuilder. > > The top are now: > > PRTokeniser.nextToken 8% 77s 19'268'000 > invocations > RandomAccessFileOrArray.read 6% 53s 149'047'680 invocations > MappedRandomAccessFile.read 3% 26s 61'065'680 invocations > PdfReader.removeUnusedCode 1% 15s 6000 invocations > PdfEncodings.convertToBytes 1% 15s5'296'207 invocations > PRTokeniser.nextVa
Re: [iText-questions] performance follow up
Yes - it needs to be int. Regardless, we need to focus on the things that are actually consuming run time, and this method isn't one of them (no matter how much it could be optimized). Mike Marchywka-2 wrote: > > > > > does this have to be int vs char or byte? I think earlier I suggested > operating on byte[] instead of making a bunch of temp strings > but I don't know the context well enough to know if this makes sense. > Certainly demorgan can help but casts and calls are not free either. > > Also, maybe hotspot runtime has gotten better but I have found in > the past that look up tables can quickly become competititve > with bit operators ( if your param is byte instead of int, a > 256 entry table can tell you if the byte is a member of which classes). > > > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28343789.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
The semantics are different (the JSE call includes more characters in it's definition of whitespace than the PDF spec). Not saying that it can't be easily done, but throwing an if statement at it and seeing what impact it has on performance is pretty easy also. What was the overall time %age spent in this call in your tests? Giovanni Azua-2 wrote: > > Hello, > > On Apr 22, 2010, at 10:59 PM, Giovanni Azua wrote: >> PRTokeniser.isWhitespace is a simple boolean condition that just happen >> to be called gazillion times e.g. 35'622'000 times for my test workload >> ... if instead of doing it like: >> >> public static final boolean isWhitespace(int ch) { >> return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch >> == 32); >> } >> >> we used a bitwise binary operator with the appropriate mask(s), there >> could be some good performance gain ... >> > The function already exists in > http://java.sun.com/javase/6/docs/api/java/lang/Character.html#isWhitespace%28char%29 > I checked and it already uses bitwise binary operators with the right > masks ... we would only need to inline it to avoid the function call > costs. > > Best regards, > Giovanni > -- > > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28334828.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
I like your approach! A simple if (ch > 32) return false; at the very top would give the most bang for the least effort (if you do go the bitmask route, be sure to include unit tests!). I know there were a lot of calls to this method, but I'm curious: in your pofiling, how much of the total processing _time_ was spent in that routine? The if() would make this 6 times faster, but it's hard to believe that this call has any appreciable contribution to runtime. Keep it up! - K Giovanni Azua-2 wrote: > > > Now only 23.8% to go. We only need to make 4 more fixes like the last one > and the gap will be gone :) The Profiler shows there are still several > bottlenecks topping which could also be easy fixes e.g. > PRTokeniser.isWhitespace is a simple boolean condition that just happen to > be called gazillion times e.g. 35'622'000 times for my test workload ... > if instead of doing it like: > > public static final boolean isWhitespace(int ch) { > return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch > == 32); > } > > we used a bitwise binary operator with the appropriate mask(s), there > could be some good performance gain ... > > Best regards, > Giovanni > -- > > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28334733.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Low level browsing of document structure
Look at the parser package (com.itextpdf.text.pdf.parser) - you can start with the PdfContentReaderTool as a starting point. I think you'll find that this will greatly simplify your efforts. Only caveat: I don't know if the parser has been ported to iTextSharp yet. - K Mircea Zahan wrote: > > Hi all, > > Everything is just peachy with iText when one > only wants to write PDFs. But when it comes to > reading, the documentation says almost nothing. > Only basics, like metadata, pages etc. > > My problem: I need to obtain all the lines, curves > etc. from a PDF together with their companions, that > is, transformation matrixes, colors etc. In short, all > the graphic content. > > I have read everything that I could get my hands > on and couldn't find a single example of how > that can be achieved. > > Anyone knows how to do that? > > > I also need to get the internal ID of an object, > the one looking like: 20 R, 30 R etc. That also I > couldn't figure out and didn't find it anywhere. > Any luck with it? > > > Most grateful, > Mircea. > > -- > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > > -- View this message in context: http://old.nabble.com/Low-level-browsing-of-document-structure-tp28078245p28084870.html Sent from the iText - General mailing list archive at Nabble.com. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Rotate Page After Adding Text to Document
I went through *exactly* the same source review last month when I started hitting it. I was quite happy to see that iText actually sets the rotation in the page dictionary (I needed it for constructing unit tests), but I agree that the way it happens is a bit hard to follow. If you think about it from the way the PDF spec is written, though, you could see how this implementation evolved: There is a dictionary entry for the unrotated page size, plus the rotation entry. The rectangle that the user has been working with has been in rotated coordinates, so to set the unrotated page size, they have to unrotate. On the content side, iText rotates the coordinate system just to make life easier for the user. And PDF has a complete disassociation between rotation at the page level, and the required rotation at the content level. Confusing as all get out. In my parsing code, I am currently ignoring the page rotation entirely, so rotated pages wind up with text alignment being off by 90 degrees (generally speaking, not an issue for text extraction because all of the text rotates - but for rendering or geometric filtering, it is an issue). At some point, I'll have to address this - probably by applying yet another CTM when computing user space coordinates. Oy. - Kevin Mark Storer-2 wrote: > > Looking through the Rectangle.rotate() -> Pdf-structures-in-the-output > code, I think we might have An Issue. Woah woah woah... let me check the > trunk instead of my red-headed-stepchild-branch of 2.0.whatever-it-was. > > Rectangle.rotate() { // yep, no changes > Rectangle rect = new Rectangle(lly, llx, ury, urx); > rect.rotation = rotation + 90; > rect.rotation %= 360; > return rect; > } > > It swaps the x's and ys, and sets the rotation member. > > In PdfDocument.newPage(), we find The Following Code: > > // [U1] page size and rotation > int rotation = pageSize.getRotation(); > ... > PdfPage page = new PdfPage(new PdfRectangle(pageSize, rotation), > thisBoxSize, resources, rotation); > > > So rotation gets passed to the new PdfRectangle and to the new PdfPage: > > public PdfRectangle(float llx, float lly, float urx, float ury, int > rotation) { > super(); > if (rotation == 90 || rotation == 270) { > this.llx = lly; > this.lly = llx; > this.urx = ury; > this.ury = urx; > } > else { > this.llx = llx; > this.lly = lly; > this.urx = urx; > this.ury = ury; > } > super.add(new PdfNumber(this.llx)); > super.add(new PdfNumber(this.lly)); > super.add(new PdfNumber(this.urx)); > super.add(new PdfNumber(this.ury)); > } > > > PdfRectangle swaps the coordinates *again*. It doesn't store the rotation > value, just makes use of it. > > And... > PdfPage(PdfRectangle mediaBox, HashMap boxSize, > PdfDictionary resources, int rotate) { > super(PAGE); > this.mediaBox = mediaBox; > put(PdfName.MEDIABOX, mediaBox); > put(PdfName.RESOURCES, resources); > if (rotate != 0) { > put(PdfName.ROTATE, new PdfNumber(rotate)); // *** This is the > only place its used *** > } > for (int k = 0; k < boxStrings.length; ++k) { > PdfObject rect = boxSize.get(boxStrings[k]); > if (rect != null) > put(boxNames[k], rect); > } > } > > > > So we've swapped it back, and stored the value in the PdfDictionary for > the page ONLY. It's not retrieved anywhere in PdfPage... > > Ah! In PdfContent, the rotation is taken from the original Rectangle > again and used in a transformation matrix, just like Kev(in?) said. > > So while swapping the rectangle coordinates twice is certainly ODD, it > doesn't look like there's anything genuinely broken in there... just an > "Even Number of Sign Errors". Those are fine as long as you find both of > them. Finding one and having the "correct" output anyway is a bit > maddening. *twitch* > > --Mark Storer > Senior Software Engineer > Cardiff.com > > #include > typedef std::Disclaimer DisCard; > > > >> -Original Message- >> From: trumpetinc [mailto:forum_...@trumpetinc.com] >> Sent: Tuesday, January 19, 2010 9:00 PM >> To: itext-questions@lists.sourceforge.net >> Subject: Re: [iText-questions] Rotate Page After Adding Text >> to Document >> >> >> >> As a point of clarification, I'm pretty sure that, in >> addition to swapping >> width and height, rotate() signals PdfDocument to add a >> rotation cm entry to >> the beginning of the content stream, and adjusts
Re: [iText-questions] Rotate Page After Adding Text to Document
As a point of clarification, I'm pretty sure that, in addition to swapping width and height, rotate() signals PdfDocument to add a rotation cm entry to the beginning of the content stream, and adjusts the rotation dictionary entry for the page. And I completely agree with the 'messy code for dealing with it' comment. As an example, ImportedPage doesn't preserve the page rotation from the source, which can cause all sorts of mayhem (esp because the page rotation implies an awkward change in origin). - K Mark Storer-2 wrote: > > Ah. So you don't want to spin-the-pages-contents-sideways, you want > landscape-vs-portrait. > > "Rotation" isn't the word you want. You just want to change the page size > from 8.5x11 to 11x8.5. By the way, Rectangle.rotate() doesn't really > rotate the page either, it swaps the width/height. In PDF, there's a > concept of page rotation indepentant of a page's physical dimensions, > which can lead to all manner of Interesting Confusion (and messy code for > dealing with it). > > -- View this message in context: http://old.nabble.com/Rotate-Page-After-Adding-Text-to-Document-tp27234067p27236838.html Sent from the iText - General mailing list archive at Nabble.com. -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Rotate Page After Adding Text to Document
Random thought (and more of a mental exercise than a real solution): I wonder if it's possible to insert an object reference for the parameters to a cm operation in a content stream... I know, for example, that it's possible to do this with text operations, so I'd imagine that it's possible with other operations. I'll leave the arduous task of actually doing such a thing as an exercise to the reader ;-) That won't, of course, take care of re-flowing the text (if that was desired), but I don't think that stamping the content onto rotated pages will do that either. -- View this message in context: http://old.nabble.com/Rotate-Page-After-Adding-Text-to-Document-tp27234067p27234542.html Sent from the iText - General mailing list archive at Nabble.com. -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Unit Testing, Stress Testing, Profiling...
For what it's worth, I've been able to create some pretty good content based unit tests using the parser... I have a filtering parser that I've put together (haven't committed it yet) that allows you to specify a region of the page to extract text from. This makes it pretty easy to determine if text was placed in the correct location. It won't work for everything, of course, but for functionality related to layout, etc... this may be useful. Cheers, - K Paulo Soares-3 wrote: > > As Mark said units tests for PDF are virtually impossible because it's > extremely difficult to verify that a PDF is correct other than by opening > and looking at it. > > Paulo > >> -Original Message- >> From: Mark Storer [mailto:msto...@autonomy.com] >> Sent: Thursday, January 14, 2010 5:15 PM >> To: Post all your questions about iText here >> Subject: Re: [iText-questions] Unit Testing, Stress Testing, >> Profiling... >> >> All the unit tests are available in the source downloads at >> http://sourceforge.net/projects/itext/files/ . You can also >> get the trunk from SVN at >> >> I don't believe there's anything in the way of performance >> testing in there, just the basic "yes: it ran, no: it didn't >> explode, yes: there's an output file" stuff. GOOD unit tests >> for PDF are Very Hard. >> >> >> >> --Mark Storer >> Senior Software Engineer >> Cardiff.com >> >> #include >> typedef std::Disclaimer DisCard; >> >> >> >> > -Original Message- >> > From: Ghady Diab [mailto:ghady.d...@live.com] >> > Sent: Wednesday, January 13, 2010 11:54 AM >> > To: itext-questions@lists.sourceforge.net >> > Subject: Re: [iText-questions] Unit Testing, Stress Testing, >> > Profiling... >> > >> > >> > Hey, >> > >> > It's Ghady DIAB from Lebanon. I'm really interested in your >> > iTextSharp >> > library (C#), and I'm working on a small project using it for >> > my university. >> > >> > Is there a way I can get the Unit Tests you did for this >> > library as well as >> > stress testing and profiling documents (results). >> > >> > If these documents are not available for free, I'll pay. Just >> > let me know if >> > they're available and how can I access them. >> > >> > Thanks in advance. >> > >> > Respectfully, >> > Ghady DIAB >> > -- >> > From: "Bruno Lowagie" >> > Sent: Wednesday, January 13, 2010 9:20 PM >> > To: "Ghady Diab" >> > Cc: >> > Subject: Re: Unit Testing, Stress Testing, Profiling... >> > >> > > Ghady Diab wrote: >> > >> Hey, >> > >> It's Ghady DIAB from Lebanon. I'm really interested in >> > your iTextSharp >> > >> library (C#), and I'm working on a small project using it for my >> > >> university. >> > >> Is there a way I can get the Unit Tests you did for this >> > library as well >> > >> as stress testing and profiling documents (results). >> > >> If these documents are not available for free, I'll pay. >> > Just let me >> > >> know if they're available and how can I access them. >> > > >> > > That's not really a sales question. The people who write >> > unit tests are on >> > > the mailing list (Xavier Le Vourch, Kevin Day); you should >> > post your >> > > question there. The address is >> > itext-questions@lists.sourceforge.net but >> > > you should register first as I'm the only one who can >> > approve questions, >> > > and I'm teaching iText in Paris the next two days (meaning: >> > I'll probably >> > > won't be online much). >> > > best regards, >> > > Bruno > > > Aviso Legal: > > Esta mensagem é destinada exclusivamente ao destinatário. Pode conter > informação confidencial ou legalmente protegida. A incorrecta transmissão > desta mensagem não significa a perca de confidencialidade. Se esta > mensagem for recebida por engano, por favor envie-a de volta para o > remetente e apague-a do seu sistema de imediato. É proibido a qualquer > pessoa que não o destinatário de usar, revelar ou distribuir qualquer > parte desta mensagem. > > > > Disclaimer: > > This message is destined exclusively to the intended receiver. It may > contain confidential or legally protected information. The incorrect > transmission of this message does not mean the loss of its > confidentiality. If this message is received by mistake, please send it > back to the sender and delete it from your system immediately. It is > forbidden to any person who is not the intended receiver to use, > distribute or copy any part of this message. > > > > > -- > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important issues > through > interactions with peers, luminaries and emerging and established > companies. > http://p.sf.net/sfu/rsaconf-dev2dev > _
Re: [iText-questions] Unit Testing, Stress Testing, Profiling...
For what it's worth, I've been able to create some pretty good content based unit tests using the parser... I have a filtering parser that I've put together (haven't committed it yet) that allows you to specify a region of the page to extract text from. This makes it pretty easy to determine if text was placed in the correct location. It won't work for everything, of course, but for functionality related to layout, etc... this may be useful. Cheers, - K -- View this message in context: http://old.nabble.com/Re%3A-Unit-Testing%2C-Stress-Testing%2C-Profiling...-tp27156414p27167317.html Sent from the iText - General mailing list archive at Nabble.com. -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] How do I get the position of an image in a PDF file?
I just committed some new (highly experimental) code to SVN (rev 4221). See com.itextpdf.text.pdf.parser.ImageRenderListener To see an example of registering a RenderListener with a PdfContentParser, see PdfTextExtractor.getTextFromPage() in the same package (you'll pass your own RenderListener in to the PdfContentStreamProcessor constructor). The image part of this is all highly experimental and not at all well tested (in fact, during your travels, if you happen to come up with some unit tests, including smallish PDF files if necessary, be sure to let me know). Most of my effort right now is focused on improving text parsing, but if you find things going on with the image side, let me know. I'm also very open to suggestions for architectural changes related to the structure of the render listeners (right now, the text and image listeners have been intentionally kept separate - maybe that is good, maybe not). I look forward to your feedback, - K On Fri, Nov 20, 2009 at 2:29 PM, trumpetinc2 wrote: [...] > > I can put something together if you are interested in testing it out and > providing feedback Sure, I'd like to give it a try. Thanks Larry -- View this message in context: http://old.nabble.com/How-do-I-get-the-position-of-an-image-in-a-PDF-file--tp26417166p26884194.html Sent from the iText - General mailing list archive at Nabble.com. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/