Re: [iText-questions] Clarification on PdfCopy.freeReader()

2010-04-29 Thread trumpetinc

ok - that's what it looked like.  So nothing bad would happen, we'd just wind
up with resource content streams getting added multiple times.  I think that
PdfSmartCopy mostly addresses the downsides of that...

Thanks,

- K


Paulo Soares-3 wrote:
 
 freeReader() makes the writer instance forget about that particular pdf.
 You may open the same pdf again but it will be like a different pdf and
 won't use any of the shared resources from the first one that you could
 use if the pdf was not freed. freeReader() is meant to be used when you're
 done with the doc, not if you intend to use it later.
 
 Paulo
 
 
 From: 'Kevin Day' [ke...@trumpetinc.com]
 Sent: Thursday, April 29, 2010 12:55 AM
 To: IText Questions
 Subject: [iText-questions] Clarification on PdfCopy.freeReader()
 
 I'm trying to get a handle on the implications of using the freeReader()
 call of PdfCopy.  If I call this, can I not add more pages from the
 'freed' reader to the same PdfCopy instance?  Or is it safe to call:
 
 pdfCopy.freeReader(myReader);
 PdfImportedPage imported = pdfCopy.getImportedPage(myReader, 1);
 pdfCopy.addPage(imported);
 pdfCopy.freeReader(myReader);
 PdfImportedPage imported = pdfCopy.getImportedPage(myReader, 2);
 pdfCopy.addPage(imported);
 
 
 Obviously, in a real app, these calls would not be in the same block of
 code.
 
 
 This is kind of a long way of asking if it wouldn't be better to just
 implicitly call freeReader() in addPage() if the reader is different from
 the currentReader.  I suspect that this would have performance
 implications for readers opened in partial mode, but I wanted to check my
 thinking.
 
 Thanks,
 
 - K
 
 
 
 
 
 
 Aviso Legal:
 Esta mensagem é destinada exclusivamente ao destinatário. Pode conter
 informação confidencial ou legalmente protegida. A incorrecta transmissão
 desta mensagem não significa a perca de confidencialidade. Se esta
 mensagem for recebida por engano, por favor envie-a de volta para o
 remetente e apague-a do seu sistema de imediato. É proibido a qualquer
 pessoa que não o destinatário de usar, revelar ou distribuir qualquer
 parte desta mensagem. 
 
 Disclaimer:
 This message is destined exclusively to the intended receiver. It may
 contain confidential or legally protected information. The incorrect
 transmission of this message does not mean the loss of its
 confidentiality. If this message is received by mistake, please send it
 back to the sender and delete it from your system immediately. It is
 forbidden to any person who is not the intended receiver to use,
 distribute or copy any part of this message.
 
 
 --
 
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 

-- 
View this message in context: 
http://old.nabble.com/Clarification-on-PdfCopy.freeReader%28%29-tp28395359p28403248.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Performance when flattening form fields

2010-04-26 Thread trumpetinc

Mike - can we please reserve this thread for a technical discussion of the
merits of the proposal?

I'd be happy to have a conversation in a separate thread regarding how iText
works.

- K


Mike Marchywka-2 wrote:
 
 
 
 
 
 
 
 
 
 
 
 
 
 Date: Sun, 25 Apr 2010 22:14:02 -0700
 From: forum_...@trumpetinc.com
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Performance when flattening form fields


 After more digging, I'm wondering if the place to do this wouldn't be in
 the
 PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do
 the same flattening operation that PdfStamper does.

 The ideal would be to factor out the behavior so the code isn't
 duplicated
 in both PdfCopy and PdfStamper...
 
 I guess I have the larger question of exactly what parsing is?
 That is, it seem generally you use itext to 1) read in somthing, often
 an existing pdf, 2) do some stuff, then 3) write out a pdf. Presumably
 as you go through step 2, you are assembling or compiling a bunch
 of structures that allow you to do step 3 but are more optimized
 for manipulation and editing the nascent PDF. 
 If I understand your earlier comments, you apparently don't actually
 have a generic PDF parser to do step 1 that works with all sequences
 you could put into step 2. Now, of course, more generally the
 above approach doesn't scale as you would always hope to stream
 to some extent- read what you need, write what you can etc. 
 However, that could probably be hidden somewhat into the implementation
 for classes for each step. 
 So, instead of things like PdfCoolFeature.doSomething(byte[] pdffile)
 you have PdfCoolFeadture.doSomething( ParsedPdfOperand pdflikething)
 where the second signature take a parameter that is generally
 optimized for a broad class of common operations. 
  

 Does anyone see any technical issues with this as a strategy?

 - K


 'Kevin Day' wrote:








 I've been doing some digging into the performance question that Giovanni
 Azua has posted about.
  
 Some of his findings (using StringBuilder, etc...) are solid
 improvements
 to overall iText performance - however, the crux of the performance
 difference he is seeing between iText and the competing solution is not
 low level.  It's a high level issue.
  
 Here's what's going on:
  
 His specific use case involves stamping headers and footers onto
 pages.  The footer contains AcroFields that must be flattened prior
 to stamping.
  
 The performance hit is coming from the fact that, in order to flatten
 and
 apply the footer, he is having to:
  
 1.  Construct a PDF using PdfStamper
 2.  Write output to a byte array output stream
 3.  Re-parse the BAOS into a PdfReader
 4.  Import the page from the reader for use as a stamp
  
 While this is functional, it is certainly not performant.
  
 A much, much faster technique would be to do the flattening to the
 *reader*, then just import the page to the output writer.  This
 avoids the awkward creation of the temporary PdfReader.
  
  
 So, the performance delta is not caused so much by iText's low level
 implementation (although the performance improvements that Giovanni has
 suggested will help to make iText even faster than it already is) - the
 delta is really caused by an awkward operation forced on the user by the
 framework.
  
  
 So, are there any fundamental reasons to not do flattening, etc... to
 the
 PdfReader?  My first look at the code indicates that it may be
 possible to factor this out of PdfStamper (basically, instead of
 adjusting
 the AcroFields dictionary and content streams in the
 PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the
 PdfReader).
  
 I'm thinking of something along the lines of:
  
 PdfFormFlattener(PdfReader).flatten(pageNumber)
  
 Maybe with supplemental methods for flattenNamedFields(pageNumber),
 flattenFieldsOfType(pageNumber)
  
 Thoughts?
  
 - K
  



 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/


 --
 View this message in context:
 http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html
 Sent from the iText - General mailing list archive at Nabble.com.


 --
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask 

Re: [iText-questions] Performance when flattening form fields

2010-04-25 Thread trumpetinc

After more digging, I'm wondering if the place to do this wouldn't be in the
PdfCopy.PageStamp class?  It seems like PageStamp.alterContents() could do
the same flattening operation that PdfStamper does.

The ideal would be to factor out the behavior so the code isn't duplicated
in both PdfCopy and PdfStamper...

Does anyone see any technical issues with this as a strategy?

- K


'Kevin Day' wrote:
 
 
 
 
 
 
 
 
 I've been doing some digging into the performance question that Giovanni
 Azua has posted about. 
 nbsp; 
 Some of his findings (using StringBuilder, etc...) are solid improvements
 to overall iText performance - however, the crux of the performance
 difference he is seeing between iText and the competing solution is not
 low level.nbsp; It's a high level issue. 
 nbsp; 
 Here's what's going on: 
 nbsp; 
 His specific use case involves stamping headers and footers onto
 pages.nbsp; The footer contains AcroFields that must be flattened prior
 to stamping. 
 nbsp; 
 The performance hit is coming from the fact that, in order to flatten and
 apply the footer, he is having to: 
 nbsp; 
 1.nbsp; Construct a PDF using PdfStamper 
 2.nbsp; Write output to a byte array output stream 
 3.nbsp; Re-parse the BAOS into a PdfReader 
 4.nbsp; Import the page from the reader for use as a stamp 
 nbsp; 
 While this is functional, it is certainly not performant. 
 nbsp; 
 A much, much faster technique would be to do the flattening to the
 *reader*, then just import the page to the output writer.nbsp; This
 avoids the awkward creation of the temporary PdfReader. 
 nbsp; 
 nbsp; 
 So, the performance delta is not caused so much by iText's low level
 implementation (although the performance improvements that Giovanni has
 suggested will help to make iText even faster than it already is) - the
 delta is really caused by an awkward operation forced on the user by the
 framework. 
 nbsp; 
 nbsp; 
 So, are there any fundamental reasons to not do flattening, etc... to the
 PdfReader?nbsp; My first look at the code indicates that it may be
 possible to factor this out of PdfStamper (basically, instead of adjusting
 the AcroFields dictionary and content streams in the
 PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the
 PdfReader). 
 nbsp; 
 I'm thinking of something along the lines of: 
 nbsp; 
 PdfFormFlattener(PdfReader).flatten(pageNumber) 
 nbsp; 
 Maybe with supplemental methods for flattenNamedFields(pageNumber),
 flattenFieldsOfType(pageNumber) 
 nbsp; 
 Thoughts? 
 nbsp; 
 - K 
 nbsp; 
 
 
 
 --
 
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 

-- 
View this message in context: 
http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-24 Thread trumpetinc

If the file is being entirely pre-loaded, then I doubt that IO blocking is a
significant contributing factor to your test.

I think that the best clue here may be the difference between performance
with form flattening and without form flattening.  Just to confirm, am I
right in saying that iText outperforms the competitor by a significant
amount in the non-flattening scenario?  If that's the case, then it seems
like we should see significant differences in the profiling results between
the flattening and non-flattening scenarios in iText.

Would you be willing to post the profiling results for both cases so we can
see which code paths are consuming the most runtime in each?

Another possibility if the profiling results show similar hotspots is that
the form flattening algorithms in iText are using the hotspot areas a lot
more than in the non-flattening case.  There may be a bunch of redundant
reads or something in the flattening case.

Let's take a look at the profiling results and see if we can draw any
conclusions about where to go next.

BTW - which profiler are you using?  Are you able to expand each of the
hotspot code paths and see the actual call path that is causing the
bottleneck?  I use jvvm, and the results of expanding the hotspot call trees
can be quite illuminating.

What I really would like is to get ahold of your two benchmark tests (with
and without flattening) so I can run it on my system - do you have anything
you can package up and share?

- K


Giovanni Azua-2 wrote:
 
 Hello,
 
 On Apr 23, 2010, at 10:50 PM, trumpetinc wrote:
 Don't know if it'll make any difference, but the way you are reading the
 file
 is horribly inefficient.  If the code you wrote is part of your test
 times,
 you might want to re-try, but using this instead (I'm just tossing this
 together - there might be type-os):
 
 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 byte[] buf = new byte[8092];
 int n;
 while ((n = is.read(buf)) = 0) {
  baos.write(buf, 0, n);
 }
 return baos.toByteArray();
 
 I tried your suggestion above and made no significative difference
 compared to doing the loading from iText. The fastest I could get my use
 case to work using this pre-loading concept was by loading the whole file
 in one shot using the code below.
 
 Applying the cumulative patch plus preloading the whole PDF using the code
 below, my original test-case now performs 7.74% faster than before,
 roughly 22% away from competitor now ...  
 
 btw the average response time numbers I was getting:
 
 - average response time of 77ms original unchanged test-case from the
 office multi-processor-multi-core workstation 
 - average response time of 15ms original unchanged test-case from home
 using my MBP
 
 I attribute the huge difference between those two similar experiments
 mainly to having an SSD drive in my MBP ... the top Host spots reported
 from the profiler are related one way or another to IO so would be no
 wonder that with an SSD drive the response time improves by a factor of
 5x. There are other differences though e.g. OS, JVM version.  
 
 Best regards,
 Giovanni
 
 private static byte[] file2ByteArray(String filePath) throws Exception {
   InputStream input = null;   
   try {
 File file = new File(filePath);
 input = new BufferedInputStream(new FileInputStream(filePath));
   
 byte[] buff = new byte[(int) file.length()];
 input.read(buff);
 
 return buff;
   }   
   finally {
 if (input != null) {
   input.close();
 }
   }
 }  
 
 
 
 --
 
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28352147.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

Yes - it needs to be int.  Regardless, we need to focus on the things that
are actually consuming run time, and this method isn't one of them (no
matter how much it could be optimized).


Mike Marchywka-2 wrote:
 
 
 
 
 does this have to be int vs char or byte? I think earlier I suggested
 operating on byte[] instead of making a bunch of temp strings
 but I don't know the context well enough to know if this makes sense.
 Certainly demorgan can help but casts and calls are not free either.
  
 Also, maybe hotspot runtime has gotten better but I have found in
 the past that look up tables can quickly become competititve
 with bit operators ( if your param is byte instead of int, a
 256 entry table can tell you if the byte is a member of which classes). 
  
 
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28343789.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

This tells us that the focus needs to be on PRTokeniser and RAFOA.  For what
it's worth, these have also shown up as the bottlenecks in profiling I've
done in the parser package (not too surprising).

I'll discuss each in turn:

RandomAccessFileOrArray - I've done a lot of thinking on this one in the
past.  This is an interesting class that can have massive impact on
performance, depending on how you use it.  For example, if you load your
source PDF entirely into memory, and pass the bytes into RAFOA, it will
remove IO bottlenecks.  

Giovanni - if your source PDFs are small enough, you might want to try this,
just to get a feel for the impact that IO blocking is having on your results
(read entire PDF into byte[] and use PdfReader(byte[]))


The next thing that I looked at was buffering (a naive use of
RandomAccessFile is horrible for performance, and significant gains can be
had by implementing a paging strategy).  I actually implemented a paging
RandomAccessFile and started work on rolling it into RAFOA last year, but my
benchmarks showed that the memory mapped strategy that RAFOA uses had
equivalent performance to the paging RAF implementation.

These tests weren't conclusive, so there may still be some things to learn
in this area.

The one problem with the memory mapped strategy (in it's current
implementation) is that really big source PDFs still can't be loaded into
memory.  This could be addressed by using a paging strategy on the mapped
portion of the file - probably keep 10 or 15 mapped regions in an MRU cache
(maybe 1MB in size each).

For reference, the ugly (really ugly) hack that determines whether RAFOA
will use memory mapped IO is the Document.plainRandomAccess static public
variable (shudder).




So what about the code paths in PRTokeniser.nextToken()?

We've got a number of tight loops reading individual characters from the
RAFOA.  If the backing source isn't buffered, this would be a problem, but I
don't know that is really the issue here (it would be worth measuring
though...)

The StringBuffer could definitely be replaced with a StringBuilder, and it
could be re-used instead of re-allocating for each call to nextTokeen()
(this would probably help quite a bit, as I'll bet the default size of the
backing buffer has to keep growing during dictionary reads).

Another thing that could make a difference is ordering of the case and if's
- for example, the default: branch turns around and does a check for (ch ==
'-' || ch == '+' || ch == '.' || (ch = '0'  ch = '9').  Changing this to
be:

case '-':
case '+':
case '.':
case '0':
...
case '9':

May be better.


The loops that check for while (ch != -1  ((ch = '0'  ch = '9') || ch
== '.')) could also probably be optimized by removing the  ch != -1 check
- the other conditions ensure that the loop will escape if ch==-1


It might be interesting to break the top level parsing branches into
separate functions so the profiler tell us which of these main branches is
consuming the bulk of the run time.


Those are the obvious low hanging fruit that I see.

Final point:  I've seen some comments suggesting inlining of some code. 
Modern VMs are quite good at doing this sort of inlining automatically - a
test would be advisable before worrying about it too much.  Having things
split out actually makes it easier to use a profiler to determine where the
bottleneck is.


One thing that is quite clear here is that we need to have some sort of
benchmark that we can use for evaluation - for example, if I had a good
benchmark test, I would have just tried the ideas above to see how they
fared.

- K


Giovanni Azua-2 wrote:
 
 
 On Apr 22, 2010, at 11:18 PM, trumpetinc wrote:
 
 I like your approach!  A simple if (ch  32) return false; at the very
 top
 would give the most bang for the least effort (if you do go the bitmask
 route, be sure to include unit tests!).
 
 
 Doing this change spares approximately two seconds out of the full
 workload so now shows 8s instead of 10s and isWhitespace stays at 1%.
 
 The numbers below include two extra changes: the one from trumpetinc above
 and migrating all StringBuffer references to use instead StringBuilder.
 
 The top are now:
 
 PRTokeniser.nextToken  8%   77s 19'268'000 
 invocations
 RandomAccessFileOrArray.read   6%   53s   149'047'680 invocations
 MappedRandomAccessFile.read  3%   26s  61'065'680 invocations
 PdfReader.removeUnusedCode   1%  15s 6000 invocations
 PdfEncodings.convertToBytes   1%   15s5'296'207 invocations
 PRTokeniser.nextValidToken1%12s   9'862'000 invocations
 PdfReader.readPRObject   1%10s   5'974'000 invocations
 ByteBuffer.append(char) 1%10s 19'379'382
 invocations
 PRTokeniser.backOnePosition  1%10s 17'574'000 invocations
 PRTokeniser.isWhitespace 1%8s   35'622'000 invocations 
 
 A bit further down there is ByteBuffer.append_i that often

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

Nah - I'm not saying that memory is cheap (or that cache misses aren't
important) - just saying that int - char casting isn't the culprit here. 
The parser is a really low level algorithm that is responsible for reading
int from the bytes of a file and figuring out the appropriate value to
convert them to.  Sometimes it's a char, sometimes not.  By the time the
results of the parse are pulled from the parser, they are not ints anymore. 
It's not like we are carrying around a massive block of int[] to represent a
string or anything like that.

The profiling results from Giovanni indicate that the call to isWhitespace
accounts for less than 1% of runtime, while the calls to
RandomAccessForOrArray.read() (and it's Mapped IO derivatives) and
PRTokeniser.nextToken() consume 17% combined.  It's probably best to focus
on those.


Mike Marchywka-2 wrote:
 
 
 
 
  
 So you are doing everything internally with 32 bit chars?
 Not a big deal but if these are mostly zero there may be 
 better ways to represent and save memory. You may say, well
 RAM is cheap but that doesn't matter since low level caches
 are fixed but I guess you can get a bigger disk and say VM is unlimited.
  
  
  
  
 are actually consuming run time, and this method isn't one of them (no
 matter how much it could be optimized).
 
 The only person with data claimed otherwise :)
  
  
  


 tp://1t3xt.info/tutorials/keywords/
 
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28344478.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

Parsing PDF requires a lot of random access.  It tends to be chunked - move
to a particular offset in the file, then parse as a stream (this is why
paging makes sense, and why memory mapping is effective until the file gets
too big).  But the parsing is incredibly complex.  You can have nested
object structures, lots of alternative representations for the same type of
data, etc... 

And we definitely don't know size of any of these structures ahead of time.


hmmm - just had a thought on IO performance.  I'll post that in a separate
message so we can keep the technical discussion separate.

- K


Mike Marchywka-2 wrote:
 
 
 You can have alt implementations in the mean time if you know
 size a priori. Ideally you would
 like to be able to operate on a stream and scrap random access.
  
 
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28345099.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

One thing occurs to me on the IO performance...  If we are using a memory
mapped file, the backing buffer is, by definition, on the native side of the
virtual/native boundary.  So readying one byte at a time requires a lot of
round trip across that boundary.  Even with memory mapping, it may make
sense to do some sort of paging...  I'll have to think on that a bit.

- K



Giovanni Azua-2 wrote:
 
 Hello trumpetinc,
 
 On Apr 23, 2010, at 7:29 PM, trumpetinc wrote:
 
 Giovanni - if your source PDFs are small enough, you might want to try
 this,
 just to get a feel for the impact that IO blocking is having on your
 results
 (read entire PDF into byte[] and use PdfReader(byte[]))
 
 Trying it right now ...
 
 The StringBuffer could definitely be replaced with a StringBuilder, and
 it
 could be re-used instead of re-allocating for each call to nextTokeen()
 
 This is what I applied yesterday with the patch I posted. It includes both
 changes in PRTokeniser: StringBuilder + reusing the same instances ... the
 improvement is somewhere around 6.2% faster for my test case. 
 
 I want to try this one you suggest above ... and then I will post the new
 numbers plus the cumulative patch I have ...
 
 Best regards,
 Giovanni
 --
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28345102.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] AW ESOME! performance follow up

2010-04-23 Thread trumpetinc

Don't know if it'll make any difference, but the way you are reading the file
is horribly inefficient.  If the code you wrote is part of your test times,
you might want to re-try, but using this instead (I'm just tossing this
together - there might be type-os):

ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8092];
int n;
while ((n = is.read(buf)) = 0) {
baos.write(buf, 0, n);
}
return baos.toByteArray();


If loading the file into main memory makes any difference, that difference
will be a measure of the impact of virtual-native interface interaction. 
In effect, this is telling us whether the calls to file.read() should be
replaced with file.read(byte[]).



From your results, are you seeing a big difference between iText and the
competitor when you aren't flattening fields vs you are flattening fields? 
Your profiling results aren't indicating bottlenecks in that area of the
code.  If iText is much faster than the competitor in the non-flattening
scenario, but slower than the competitor in the flattening scenario, I'm
having a hard time reconciling the data presented so far.



Giovanni Azua-2 wrote:
 
 
 I am sooo sorry the performance is worse with the change for pre-loading
 the PDFs in the test-case :(( the problem was that I ran the
 benchmarks with a small mistake in my test case ... 
 
 Loading the HEADER demonstrates how to load flattened pre-formatted PDF
 part templates ...
 
 Loading the FOOTER demonstrates how to load PDF part templates containing
 fields  that need to be populated.
 
 The mistake was to leave fixed the HEADER always ... so it would load only
 the flattened PDF template and not the footer (see below) [sigh] In any
 case is good to know that loading flattened PDF parts is cheaper.
 
 I mistakenly ran the last benchmark like this:
 
 private static byte[] file2ByteArray(String filePath) throws Exception {
   InputStream input = null;   
   ByteArrayOutputStream output = null;
   try {
 input = new BufferedInputStream(new FileInputStream(HEADER_PATH));
 output = new ByteArrayOutputStream();
 int data = input.read();
 while (data != -1) {
   output.write(data);
   
   data = input.read();
 }
   
 return output.toByteArray();
   }   
   finally {
 if (input != null) {
   input.close();
 }
   
 if (output != null) {
   output.close();
 }
   }
 }
 --
 
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28346146.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

I'd love to discuss specific ideas on prediction - are you familiar enough
with the PDF spec to provide any suggestions?

Some obvious ones are the xref table - but iText reads that entirely into
memory one time and holds onto it, so it seems unlikely that pre-fetch would
do much there (other than having the last 1MB of the file be the first block
pre-fetched - but any sort of paging implementation would handle that
already).

The rest... well, from my experience with this, you've got objects that
refer to other objects that refer to other objects.  And there's really no
way to know where in the object graph you need to go until you parse and
then go there.  So I think I'll need some concrete examples of how this
might be done with PDF structure - just to get my creativity going!

- K


Mike Marchywka-2 wrote:
 
 
 


 Parsing PDF requires a lot of random access. It tends to be chunked -
 move
 to a particular offset in the file, then parse as a stream (this is why
 paging makes sense, and why memory mapping is effective until the file
 gets
 
 Yes, that is great but instead of a generic MRU approach are
 there better predictions you can make, even start loaing pages
 before having to wait later etc? Maybe multithreading makes
 sense here. 
  
  
  
 too big). But the parsing is incredibly complex. You can have nested
 object structures, lots of alternative representations for the same type
 of
 data, etc...
 
 surely there are rules and I'm sure this topic has been beaten
 to death in many CS courses ( as have stats LOL). Profiling 
 should point to some suspects. Algorithmic optimizations may
 be possible as maybe just coding changes. Most compilers
 operate sequentially on input in maybe multiple passes I'm
 sure you can find ideas easily in a vraiety of sources.
  
  

 And we definitely don't know size of any of these structures ahead of
 time.
 
 well, you don;t need to know if a week ahead of time, but
 you could maybe waste an access or two finding sizes if that
 can be done more quickly than just reading everything. 
  
 
 _
 Hotmail is redefining busy with tools for the New Busy. Get more from your
 inbox.
 http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
 --
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28346601.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-22 Thread trumpetinc

I like your approach!  A simple if (ch  32) return false; at the very top
would give the most bang for the least effort (if you do go the bitmask
route, be sure to include unit tests!).

I know there were a lot of calls to this method, but I'm curious:  in your
pofiling, how much of the total processing _time_ was spent in that routine? 
The if() would make this 6 times faster, but it's hard to believe that this
call has any appreciable contribution to runtime.

Keep it up!

- K


Giovanni Azua-2 wrote:
 
 
 Now only 23.8% to go. We only need to make 4 more fixes like the last one
 and the gap will be gone :) The Profiler shows there are still several
 bottlenecks topping which could also be easy fixes e.g.
 PRTokeniser.isWhitespace is a simple boolean condition that just happen to
 be called gazillion times e.g. 35'622'000 times for my test workload ...
 if instead of doing it like:
 
 public static final boolean isWhitespace(int ch) {
 return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch
 == 32);
 } 
 
 we used a bitwise binary operator with the appropriate mask(s), there
 could be some good performance gain ... 
 
 Best regards,
 Giovanni
 --
 
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28334733.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-22 Thread trumpetinc

The semantics are different (the JSE call includes more characters in it's
definition of whitespace than the PDF spec).  Not saying that it can't be
easily done, but throwing an if statement at it and seeing what impact it
has on performance is pretty easy also.

What was the overall time %age spent in this call in your tests?


Giovanni Azua-2 wrote:
 
 Hello,
 
 On Apr 22, 2010, at 10:59 PM, Giovanni Azua wrote:
 PRTokeniser.isWhitespace is a simple boolean condition that just happen
 to be called gazillion times e.g. 35'622'000 times for my test workload
 ... if instead of doing it like:
 
 public static final boolean isWhitespace(int ch) {
 return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch
 == 32);
 } 
 
 we used a bitwise binary operator with the appropriate mask(s), there
 could be some good performance gain ... 
 
 The function already exists in
 http://java.sun.com/javase/6/docs/api/java/lang/Character.html#isWhitespace%28char%29
 I checked and it already uses bitwise binary operators with the right
 masks ... we would only need to inline it to avoid the function call
 costs.
 
 Best regards,
 Giovanni
 --
 
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28334828.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Low level browsing of document structure

2010-03-30 Thread trumpetinc

Look at the parser package (com.itextpdf.text.pdf.parser) - you can start
with the PdfContentReaderTool as a starting point.  I think you'll find that
this will greatly simplify your efforts.

Only caveat:  I don't know if the parser has been ported to iTextSharp yet.

- K


Mircea Zahan wrote:
 
 Hi all,
 
 Everything is just peachy with iText when one
 only wants to write PDFs. But when it comes to
 reading, the documentation says almost nothing.
 Only basics, like metadata, pages etc.
 
 My problem: I need to obtain all the lines, curves
 etc. from a PDF together with their companions, that
 is, transformation matrixes, colors etc. In short, all
 the graphic content.
 
 I have read everything that I could get my hands
 on and couldn't find a single example of how
 that can be achieved.
 
 Anyone knows how to do that?
 
 
 I also need to get the internal ID of an object,
 the one looking like: 20 R, 30 R etc. That also I
 couldn't figure out and didn't find it anywhere.
 Any luck with it?
 
 
 Most grateful,
 Mircea.
 
 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions:
 http://www.1t3xt.info/examples/
 You can also search the keywords list:
 http://1t3xt.info/tutorials/keywords/
 
 

-- 
View this message in context: 
http://old.nabble.com/Low-level-browsing-of-document-structure-tp28078245p28084870.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-20 Thread trumpetinc

I went through *exactly* the same source review last month when I started
hitting it.  I was quite happy to see that iText actually sets the rotation
in the page dictionary (I needed it for constructing unit tests), but I
agree that the way it happens is a bit hard to follow.  If you think about
it from the way the PDF spec is written, though, you could see how this
implementation evolved:

There is a dictionary entry for the unrotated page size, plus the rotation
entry.  The rectangle that the user has been working with has been in
rotated coordinates, so to set the unrotated page size, they have to
unrotate.

On the content side, iText rotates the coordinate system just to make life
easier for the user.  And PDF has a complete disassociation between rotation
at the page level, and the required rotation at the content level. 
Confusing as all get out.

In my parsing code, I am currently ignoring the page rotation entirely, so
rotated pages wind up with text alignment being off by 90 degrees (generally
speaking, not an issue for text extraction because all of the text rotates -
but for rendering or geometric filtering, it is an issue).  At some point,
I'll have to address this - probably by applying yet another CTM when
computing user space coordinates.  Oy.

- Kevin


Mark Storer-2 wrote:
 
 Looking through the Rectangle.rotate() - Pdf-structures-in-the-output
 code, I think we might have An Issue.  Woah woah woah... let me check the
 trunk instead of my red-headed-stepchild-branch of 2.0.whatever-it-was.
 
 Rectangle.rotate() { // yep, no changes
   Rectangle rect = new Rectangle(lly, llx, ury, urx);
   rect.rotation = rotation + 90;
   rect.rotation %= 360;
   return rect;
 }
 
 It swaps the x's and ys, and sets the rotation member.
 
 In PdfDocument.newPage(), we find The Following Code:
 
 // [U1] page size and rotation
 int rotation = pageSize.getRotation();
 ...
 PdfPage page = new PdfPage(new PdfRectangle(pageSize, rotation),
 thisBoxSize, resources, rotation);
 
 
 So rotation gets passed to the new PdfRectangle and to the new PdfPage:
 
 public PdfRectangle(float llx, float lly, float urx, float ury, int
 rotation) {
 super();
 if (rotation == 90 || rotation == 270) {
 this.llx = lly;
 this.lly = llx;
 this.urx = ury;
 this.ury = urx;
 }
 else {
 this.llx = llx;
 this.lly = lly;
 this.urx = urx;
 this.ury = ury;
 }
 super.add(new PdfNumber(this.llx));
 super.add(new PdfNumber(this.lly));
 super.add(new PdfNumber(this.urx));
 super.add(new PdfNumber(this.ury));
 }
 
 
 PdfRectangle swaps the coordinates *again*.  It doesn't store the rotation
 value, just makes use of it.
 
 And...
 PdfPage(PdfRectangle mediaBox, HashMapString, PdfRectangle boxSize,
 PdfDictionary resources, int rotate) {
 super(PAGE);
 this.mediaBox = mediaBox;
 put(PdfName.MEDIABOX, mediaBox);
 put(PdfName.RESOURCES, resources); 
 if (rotate != 0) {
 put(PdfName.ROTATE, new PdfNumber(rotate));  // *** This is the
 only place its used ***
 }
 for (int k = 0; k  boxStrings.length; ++k) {
 PdfObject rect = boxSize.get(boxStrings[k]);
 if (rect != null)
 put(boxNames[k], rect);
 }
 }
 
 
 
 So we've swapped it back, and stored the value in the PdfDictionary for
 the page ONLY.  It's not retrieved anywhere in PdfPage...
 
 Ah!  In PdfContent, the rotation is taken from the original Rectangle
 again and used in a transformation matrix, just like Kev(in?) said.
 
 So while swapping the rectangle coordinates twice is certainly ODD, it
 doesn't look like there's anything genuinely broken in there... just an
 Even Number of Sign Errors.  Those are fine as long as you find both of
 them.  Finding one and having the correct output anyway is a bit
 maddening.  *twitch*
 
 --Mark Storer 
   Senior Software Engineer 
   Cardiff.com
 
 #include disclaimer 
 typedef std::DisclaimerCardiff DisCard; 
 
 
 
 -Original Message-
 From: trumpetinc [mailto:forum_...@trumpetinc.com]
 Sent: Tuesday, January 19, 2010 9:00 PM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] Rotate Page After Adding Text 
 to Document
 
 
 
 As a point of clarification, I'm pretty sure that, in 
 addition to swapping
 width and height, rotate() signals PdfDocument to add a 
 rotation cm entry to
 the beginning of the content stream, and adjusts the rotation 
 dictionary
 entry for the page.
 
 And I completely agree with the 'messy code for dealing with 
 it' comment. 
 As an example, ImportedPage doesn't preserve the page 
 rotation from the
 source, which can cause all sorts of mayhem (esp because the 
 page rotation
 implies an awkward change in origin).
 
 - K
 
 
 Mark Storer-2 wrote:
  
  Ah.  So you don't want to spin-the-pages-contents-sideways, you want
  landscape-vs-portrait.
   
  Rotation isn't the word you want.  You just want

Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-19 Thread trumpetinc

Random thought (and more of a mental exercise than a real solution):  I
wonder if it's possible to insert an object reference for the parameters to
a cm operation in a content stream...

I know, for example, that it's possible to do this with text operations, so
I'd imagine that it's possible with other operations.

I'll leave the arduous task of actually doing such a thing as an exercise to
the reader ;-)


That won't, of course, take care of re-flowing the text (if that was
desired), but I don't think that stamping the content onto rotated pages
will do that either.


-- 
View this message in context: 
http://old.nabble.com/Rotate-Page-After-Adding-Text-to-Document-tp27234067p27234542.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-19 Thread trumpetinc

As a point of clarification, I'm pretty sure that, in addition to swapping
width and height, rotate() signals PdfDocument to add a rotation cm entry to
the beginning of the content stream, and adjusts the rotation dictionary
entry for the page.

And I completely agree with the 'messy code for dealing with it' comment. 
As an example, ImportedPage doesn't preserve the page rotation from the
source, which can cause all sorts of mayhem (esp because the page rotation
implies an awkward change in origin).

- K


Mark Storer-2 wrote:
 
 Ah.  So you don't want to spin-the-pages-contents-sideways, you want
 landscape-vs-portrait.
  
 Rotation isn't the word you want.  You just want to change the page size
 from 8.5x11 to 11x8.5.  By the way, Rectangle.rotate() doesn't really
 rotate the page either, it swaps the width/height.  In PDF, there's a
 concept of page rotation indepentant of a page's physical dimensions,
 which can lead to all manner of Interesting Confusion (and messy code for
 dealing with it).
  
 

-- 
View this message in context: 
http://old.nabble.com/Rotate-Page-After-Adding-Text-to-Document-tp27234067p27236838.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Unit Testing, Stress Testing, Profiling...

2010-01-16 Thread trumpetinc

For what it's worth, I've been able to create some pretty good content based
unit tests using the parser...  I have a filtering parser that I've put
together (haven't committed it yet) that allows you to specify a region of
the page to extract text from.  This makes it pretty easy to determine if
text was placed in the correct location.

It won't work for everything, of course, but for functionality related to
layout, etc... this may be useful.

Cheers,

- K


Paulo Soares-3 wrote:
 
 As Mark said units tests for PDF are virtually impossible because it's
 extremely difficult to verify that a PDF is correct other than by opening
 and looking at it.
 
 Paulo
 
 -Original Message-
 From: Mark Storer [mailto:msto...@autonomy.com] 
 Sent: Thursday, January 14, 2010 5:15 PM
 To: Post all your questions about iText here
 Subject: Re: [iText-questions] Unit Testing, Stress Testing, 
 Profiling...
 
 All the unit tests are available in the source downloads at 
 http://sourceforge.net/projects/itext/files/ .  You can also 
 get the trunk from SVN at 
 
 I don't believe there's anything in the way of performance 
 testing in there, just the basic yes: it ran, no: it didn't 
 explode, yes: there's an output file stuff.  GOOD unit tests 
 for PDF are Very Hard.
 
 
 
 --Mark Storer 
   Senior Software Engineer 
   Cardiff.com
 
 #include disclaimer 
 typedef std::DisclaimerCardiff DisCard; 
 
 
 
  -Original Message-
  From: Ghady Diab [mailto:ghady.d...@live.com]
  Sent: Wednesday, January 13, 2010 11:54 AM
  To: itext-questions@lists.sourceforge.net
  Subject: Re: [iText-questions] Unit Testing, Stress Testing,
  Profiling...
  
  
  Hey,
  
  It's Ghady DIAB from Lebanon. I'm really interested in your 
  iTextSharp 
  library (C#), and I'm working on a small project using it for 
  my university.
  
  Is there a way I can get the Unit Tests you did for this 
  library as well as 
  stress testing and profiling documents (results).
  
  If these documents are not available for free, I'll pay. Just 
  let me know if 
  they're available and how can I access them.
  
  Thanks in advance.
  
  Respectfully,
  Ghady DIAB
  --
  From: Bruno Lowagie br...@lowagie.com
  Sent: Wednesday, January 13, 2010 9:20 PM
  To: Ghady Diab ghady.d...@live.com
  Cc: sa...@itextpdf.com
  Subject: Re: Unit Testing, Stress Testing, Profiling...
  
   Ghady Diab wrote:
   Hey,
It's Ghady DIAB from Lebanon. I'm really interested in 
  your iTextSharp 
   library (C#), and I'm working on a small project using it for my 
   university.
Is there a way I can get the Unit Tests you did for this 
  library as well 
   as stress testing and profiling documents (results).
If these documents are not available for free, I'll pay. 
  Just let me 
   know if they're available and how can I access them.
  
   That's not really a sales question. The people who write 
  unit tests are on 
   the mailing list (Xavier Le Vourch, Kevin Day); you should 
  post your 
   question there. The address is 
  itext-questions@lists.sourceforge.net but 
   you should register first as I'm the only one who can 
  approve questions, 
   and I'm teaching iText in Paris the next two days (meaning: 
  I'll probably 
   won't be online much).
   best regards,
   Bruno
 
 
 Aviso Legal:
 
 Esta mensagem é destinada exclusivamente ao destinatário. Pode conter
 informação confidencial ou legalmente protegida. A incorrecta transmissão
 desta mensagem não significa a perca de confidencialidade. Se esta
 mensagem for recebida por engano, por favor envie-a de volta para o
 remetente e apague-a do seu sistema de imediato. É proibido a qualquer
 pessoa que não o destinatário de usar, revelar ou distribuir qualquer
 parte desta mensagem. 
 
 
 
 Disclaimer:
 
 This message is destined exclusively to the intended receiver. It may
 contain confidential or legally protected information. The incorrect
 transmission of this message does not mean the loss of its
 confidentiality. If this message is received by mistake, please send it
 back to the sender and delete it from your system immediately. It is
 forbidden to any person who is not the intended receiver to use,
 distribute or copy any part of this message.
 
 
 
 
 --
 Throughout its 18-year history, RSA Conference consistently attracts the
 world's best and brightest in the field, creating opportunities for
 Conference
 attendees to learn about information security's most important issues
 through
 interactions with peers, luminaries and emerging and established
 companies.
 http://p.sf.net/sfu/rsaconf-dev2dev
 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions
 
 Buy the iText book: http://www.1t3xt.com/docs/book.php
 Check the site with examples before 

Re: [iText-questions] Unit Testing, Stress Testing, Profiling...

2010-01-14 Thread trumpetinc

For what it's worth, I've been able to create some pretty good content based
unit tests using the parser...  I have a filtering parser that I've put
together (haven't committed it yet) that allows you to specify a region of
the page to extract text from.  This makes it pretty easy to determine if
text was placed in the correct location.

It won't work for everything, of course, but for functionality related to
layout, etc... this may be useful.

Cheers,

- K 

-- 
View this message in context: 
http://old.nabble.com/Re%3A-Unit-Testing%2C-Stress-Testing%2C-Profiling...-tp27156414p27167317.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] How do I get the position of an image in a PDF file?

2009-12-21 Thread trumpetinc

I just committed some new (highly experimental) code to SVN (rev 4221).

See com.itextpdf.text.pdf.parser.ImageRenderListener

To see an example of registering a RenderListener with a PdfContentParser,
see PdfTextExtractor.getTextFromPage() in the same package (you'll pass your
own RenderListener in to the PdfContentStreamProcessor constructor).

The image part of this is all highly experimental and not at all well tested
(in fact, during your travels, if you happen to come up with some unit
tests, including smallish PDF files if necessary, be sure to let me know). 
Most of my effort right now is focused on improving text parsing, but if you
find things going on with the image side, let me know.

I'm also very open to suggestions for architectural changes related to the
structure of the render listeners (right now, the text and image listeners
have been intentionally kept separate - maybe that is good, maybe not).

I look forward to your feedback,

- K

On Fri, Nov 20, 2009 at 2:29 PM, trumpetinc2 forum_...@trumpetinc.com
wrote:
[...]


 I can put something together if you are interested in testing it out and
 providing feedback

Sure, I'd like to give it a try.

Thanks Larry


-- 
View this message in context: 
http://old.nabble.com/How-do-I-get-the-position-of-an-image-in-a-PDF-file--tp26417166p26884194.html
Sent from the iText - General mailing list archive at Nabble.com.


--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/