Re: [iText-questions] Clarification on PdfCopy.freeReader()

2010-04-29 Thread trumpetinc

ok - that's what it looked like.  So nothing bad would happen, we'd just wind
up with resource content streams getting added multiple times.  I think that
PdfSmartCopy mostly addresses the downsides of that...

Thanks,

- K


Paulo Soares-3 wrote:
> 
> freeReader() makes the writer instance forget about that particular pdf.
> You may open the same pdf again but it will be like a different pdf and
> won't use any of the shared resources from the first one that you could
> use if the pdf was not freed. freeReader() is meant to be used when you're
> done with the doc, not if you intend to use it later.
> 
> Paulo
> 
> 
> From: 'Kevin Day' [ke...@trumpetinc.com]
> Sent: Thursday, April 29, 2010 12:55 AM
> To: IText Questions
> Subject: [iText-questions] Clarification on PdfCopy.freeReader()
> 
> I'm trying to get a handle on the implications of using the freeReader()
> call of PdfCopy.  If I call this, can I not add more pages from the
> 'freed' reader to the same PdfCopy instance?  Or is it safe to call:
> 
> pdfCopy.freeReader(myReader);
> PdfImportedPage imported = pdfCopy.getImportedPage(myReader, 1);
> pdfCopy.addPage(imported);
> pdfCopy.freeReader(myReader);
> PdfImportedPage imported = pdfCopy.getImportedPage(myReader, 2);
> pdfCopy.addPage(imported);
> 
> 
> Obviously, in a real app, these calls would not be in the same block of
> code.
> 
> 
> This is kind of a long way of asking if it wouldn't be better to just
> implicitly call freeReader() in addPage() if the reader is different from
> the currentReader.  I suspect that this would have performance
> implications for readers opened in partial mode, but I wanted to check my
> thinking.
> 
> Thanks,
> 
> - K
> 
> 
> 
> 
> 
> 
> Aviso Legal:
> Esta mensagem é destinada exclusivamente ao destinatário. Pode conter
> informação confidencial ou legalmente protegida. A incorrecta transmissão
> desta mensagem não significa a perca de confidencialidade. Se esta
> mensagem for recebida por engano, por favor envie-a de volta para o
> remetente e apague-a do seu sistema de imediato. É proibido a qualquer
> pessoa que não o destinatário de usar, revelar ou distribuir qualquer
> parte desta mensagem. 
> 
> Disclaimer:
> This message is destined exclusively to the intended receiver. It may
> contain confidential or legally protected information. The incorrect
> transmission of this message does not mean the loss of its
> confidentiality. If this message is received by mistake, please send it
> back to the sender and delete it from your system immediately. It is
> forbidden to any person who is not the intended receiver to use,
> distribute or copy any part of this message.
> 
> 
> --
> 
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 

-- 
View this message in context: 
http://old.nabble.com/Clarification-on-PdfCopy.freeReader%28%29-tp28395359p28403248.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Performance when flattening form fields

2010-04-26 Thread trumpetinc

Mike - can we please reserve this thread for a technical discussion of the
merits of the proposal?

I'd be happy to have a conversation in a separate thread regarding how iText
works.

- K


Mike Marchywka-2 wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> Date: Sun, 25 Apr 2010 22:14:02 -0700
>> From: forum_...@trumpetinc.com
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Performance when flattening form fields
>>
>>
>> After more digging, I'm wondering if the place to do this wouldn't be in
>> the
>> PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do
>> the same flattening operation that PdfStamper does.
>>
>> The ideal would be to factor out the behavior so the code isn't
>> duplicated
>> in both PdfCopy and PdfStamper...
> 
> I guess I have the larger question of exactly what parsing is?
> That is, it seem generally you use itext to 1) read in somthing, often
> an existing pdf, 2) do some stuff, then 3) write out a pdf. Presumably
> as you go through step 2, you are assembling or compiling a bunch
> of structures that allow you to do step 3 but are more optimized
> for manipulation and editing the nascent PDF. 
> If I understand your earlier comments, you apparently don't actually
> have a generic PDF parser to do step 1 that works with all sequences
> you could put into step 2. Now, of course, more generally the
> above approach doesn't scale as you would always hope to stream
> to some extent- read what you need, write what you can etc. 
> However, that could probably be hidden somewhat into the implementation
> for classes for each step. 
> So, instead of things like PdfCoolFeature.doSomething(byte[] pdffile)
> you have PdfCoolFeadture.doSomething( ParsedPdfOperand pdflikething)
> where the second signature take a parameter that is generally
> optimized for a broad class of common operations. 
>  
>>
>> Does anyone see any technical issues with this as a strategy?
>>
>> - K
>>
>>
>> 'Kevin Day' wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I've been doing some digging into the performance question that Giovanni
>>> Azua has posted about.
>>>  
>>> Some of his findings (using StringBuilder, etc...) are solid
>>> improvements
>>> to overall iText performance - however, the crux of the performance
>>> difference he is seeing between iText and the competing solution is not
>>> low level.  It's a high level issue.
>>>  
>>> Here's what's going on:
>>>  
>>> His specific use case involves stamping headers and footers onto
>>> pages.  The footer contains AcroFields that must be flattened prior
>>> to stamping.
>>>  
>>> The performance hit is coming from the fact that, in order to flatten
>>> and
>>> apply the footer, he is having to:
>>>  
>>> 1.  Construct a PDF using PdfStamper
>>> 2.  Write output to a byte array output stream
>>> 3.  Re-parse the BAOS into a PdfReader
>>> 4.  Import the page from the reader for use as a stamp
>>>  
>>> While this is functional, it is certainly not performant.
>>>  
>>> A much, much faster technique would be to do the flattening to the
>>> *reader*, then just import the page to the output writer.  This
>>> avoids the awkward creation of the temporary PdfReader.
>>>  
>>>  
>>> So, the performance delta is not caused so much by iText's low level
>>> implementation (although the performance improvements that Giovanni has
>>> suggested will help to make iText even faster than it already is) - the
>>> delta is really caused by an awkward operation forced on the user by the
>>> framework.
>>>  
>>>  
>>> So, are there any fundamental reasons to not do flattening, etc... to
>>> the
>>> PdfReader?  My first look at the code indicates that it may be
>>> possible to factor this out of PdfStamper (basically, instead of
>>> adjusting
>>> the AcroFields dictionary and content streams in the
>>> PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the
>>> PdfReader).
>>>  
>>> I'm thinking of something along the lines of:
>>>  
>>> PdfFormFlattener(PdfReader).flatten(pageNumber)
>>>  
>>> Maybe with supplemental methods for flattenNamedFields(pageNumber),
>>> flattenFieldsOfType(pageNumber)
>>>  
>>> Thoughts?
>>>  
>>> - K
>>>  
>>>
>>>
>>>
>>> --
>>>
>>> ___
>>> iText-questions mailing list
>>> iText-questions@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>>>
>>> Buy the iText book: http://www.itextpdf.com/book/
>>> Check the site with examples before you ask questions:
>>> http://www.1t3xt.info/examples/
>>> You can also search the keywords list:
>>> http://1t3xt.info/tutorials/keywords/
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html
>> Sent from the iText - General mailing list archive at Nabble.com.
>>
>>
>> 

Re: [iText-questions] Performance when flattening form fields

2010-04-25 Thread trumpetinc

After more digging, I'm wondering if the place to do this wouldn't be in the
PdfCopy.PageStamp class?  It seems like PageStamp.alterContents() could do
the same flattening operation that PdfStamper does.

The ideal would be to factor out the behavior so the code isn't duplicated
in both PdfCopy and PdfStamper...

Does anyone see any technical issues with this as a strategy?

- K


'Kevin Day' wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> I've been doing some digging into the performance question that Giovanni
> Azua has posted about. 
>   
> Some of his findings (using StringBuilder, etc...) are solid improvements
> to overall iText performance - however, the crux of the performance
> difference he is seeing between iText and the competing solution is not
> low level.  It's a high level issue. 
>   
> Here's what's going on: 
>   
> His specific use case involves stamping headers and footers onto
> pages.  The footer contains AcroFields that must be flattened prior
> to stamping. 
>   
> The performance hit is coming from the fact that, in order to flatten and
> apply the footer, he is having to: 
>   
> 1.  Construct a PDF using PdfStamper 
> 2.  Write output to a byte array output stream 
> 3.  Re-parse the BAOS into a PdfReader 
> 4.  Import the page from the reader for use as a stamp 
>   
> While this is functional, it is certainly not performant. 
>   
> A much, much faster technique would be to do the flattening to the
> *reader*, then just import the page to the output writer.  This
> avoids the awkward creation of the temporary PdfReader. 
>   
>   
> So, the performance delta is not caused so much by iText's low level
> implementation (although the performance improvements that Giovanni has
> suggested will help to make iText even faster than it already is) - the
> delta is really caused by an awkward operation forced on the user by the
> framework. 
>   
>   
> So, are there any fundamental reasons to not do flattening, etc... to the
> PdfReader?  My first look at the code indicates that it may be
> possible to factor this out of PdfStamper (basically, instead of adjusting
> the AcroFields dictionary and content streams in the
> PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the
> PdfReader). 
>   
> I'm thinking of something along the lines of: 
>   
> PdfFormFlattener(PdfReader).flatten(pageNumber) 
>   
> Maybe with supplemental methods for flattenNamedFields(pageNumber),
> flattenFieldsOfType(pageNumber) 
>   
> Thoughts? 
>   
> - K 
>   
> 
> 
> 
> --
> 
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 

-- 
View this message in context: 
http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-24 Thread trumpetinc

If the file is being entirely pre-loaded, then I doubt that IO blocking is a
significant contributing factor to your test.

I think that the best clue here may be the difference between performance
with form flattening and without form flattening.  Just to confirm, am I
right in saying that iText outperforms the competitor by a significant
amount in the non-flattening scenario?  If that's the case, then it seems
like we should see significant differences in the profiling results between
the flattening and non-flattening scenarios in iText.

Would you be willing to post the profiling results for both cases so we can
see which code paths are consuming the most runtime in each?

Another possibility if the profiling results show similar hotspots is that
the form flattening algorithms in iText are using the hotspot areas a lot
more than in the non-flattening case.  There may be a bunch of redundant
reads or something in the flattening case.

Let's take a look at the profiling results and see if we can draw any
conclusions about where to go next.

BTW - which profiler are you using?  Are you able to expand each of the
hotspot code paths and see the actual call path that is causing the
bottleneck?  I use jvvm, and the results of expanding the hotspot call trees
can be quite illuminating.

What I really would like is to get ahold of your two benchmark tests (with
and without flattening) so I can run it on my system - do you have anything
you can package up and share?

- K


Giovanni Azua-2 wrote:
> 
> Hello,
> 
> On Apr 23, 2010, at 10:50 PM, trumpetinc wrote:
>> Don't know if it'll make any difference, but the way you are reading the
>> file
>> is horribly inefficient.  If the code you wrote is part of your test
>> times,
>> you might want to re-try, but using this instead (I'm just tossing this
>> together - there might be type-os):
>> 
>> ByteArrayOutputStream baos = new ByteArrayOutputStream();
>> byte[] buf = new byte[8092];
>> int n;
>> while ((n = is.read(buf)) >= 0) {
>>  baos.write(buf, 0, n);
>> }
>> return baos.toByteArray();
>> 
> I tried your suggestion above and made no significative difference
> compared to doing the loading from iText. The fastest I could get my use
> case to work using this pre-loading concept was by loading the whole file
> in one shot using the code below.
> 
> Applying the cumulative patch plus preloading the whole PDF using the code
> below, my original test-case now performs 7.74% faster than before,
> roughly 22% away from competitor now ...  
> 
> btw the average response time numbers I was getting:
> 
> - average response time of 77ms original unchanged test-case from the
> office multi-processor-multi-core workstation 
> - average response time of 15ms original unchanged test-case from home
> using my MBP
> 
> I attribute the huge difference between those two similar experiments
> mainly to having an SSD drive in my MBP ... the top Host spots reported
> from the profiler are related one way or another to IO so would be no
> wonder that with an SSD drive the response time improves by a factor of
> 5x. There are other differences though e.g. OS, JVM version.  
> 
> Best regards,
> Giovanni
> 
> private static byte[] file2ByteArray(String filePath) throws Exception {
>   InputStream input = null;   
>   try {
> File file = new File(filePath);
> input = new BufferedInputStream(new FileInputStream(filePath));
>   
> byte[] buff = new byte[(int) file.length()];
> input.read(buff);
> 
> return buff;
>   }   
>   finally {
> if (input != null) {
>   input.close();
> }
>   }
> }  
> 
> 
> 
> --
> 
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28352147.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

I'd love to discuss specific ideas on prediction - are you familiar enough
with the PDF spec to provide any suggestions?

Some obvious ones are the xref table - but iText reads that entirely into
memory one time and holds onto it, so it seems unlikely that pre-fetch would
do much there (other than having the last 1MB of the file be the first block
pre-fetched - but any sort of paging implementation would handle that
already).

The rest... well, from my experience with this, you've got objects that
refer to other objects that refer to other objects.  And there's really no
way to know where in the object graph you need to go until you parse and
then go there.  So I think I'll need some concrete examples of how this
might be done with PDF structure - just to get my creativity going!

- K


Mike Marchywka-2 wrote:
> 
> 
> 
>>
>>
>> Parsing PDF requires a lot of random access. It tends to be chunked -
>> move
>> to a particular offset in the file, then parse as a stream (this is why
>> paging makes sense, and why memory mapping is effective until the file
>> gets
> 
> Yes, that is great but instead of a generic MRU approach are
> there better predictions you can make, even start loaing pages
> before having to wait later etc? Maybe multithreading makes
> sense here. 
>  
>  
>  
>> too big). But the parsing is incredibly complex. You can have nested
>> object structures, lots of alternative representations for the same type
>> of
>> data, etc...
> 
> surely there are rules and I'm sure this topic has been beaten
> to death in many CS courses ( as have stats LOL). Profiling 
> should point to some suspects. Algorithmic optimizations may
> be possible as maybe just coding changes. Most compilers
> operate sequentially on input in maybe multiple passes I'm
> sure you can find ideas easily in a vraiety of sources.
>  
>  
>>
>> And we definitely don't know size of any of these structures ahead of
>> time.
> 
> well, you don;t need to know if a week ahead of time, but
> you could maybe waste an access or two finding sizes if that
> can be done more quickly than just reading everything. 
>  
> 
> _
> Hotmail is redefining busy with tools for the New Busy. Get more from your
> inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
> --
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28346601.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] AW ESOME! performance follow up

2010-04-23 Thread trumpetinc

Don't know if it'll make any difference, but the way you are reading the file
is horribly inefficient.  If the code you wrote is part of your test times,
you might want to re-try, but using this instead (I'm just tossing this
together - there might be type-os):

ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8092];
int n;
while ((n = is.read(buf)) >= 0) {
baos.write(buf, 0, n);
}
return baos.toByteArray();


If loading the file into main memory makes any difference, that difference
will be a measure of the impact of virtual<->native interface interaction. 
In effect, this is telling us whether the calls to file.read() should be
replaced with file.read(byte[]).



>From your results, are you seeing a big difference between iText and the
competitor when you aren't flattening fields vs you are flattening fields? 
Your profiling results aren't indicating bottlenecks in that area of the
code.  If iText is much faster than the competitor in the non-flattening
scenario, but slower than the competitor in the flattening scenario, I'm
having a hard time reconciling the data presented so far.



Giovanni Azua-2 wrote:
> 
> 
> I am sooo sorry the performance is worse with the change for pre-loading
> the PDFs in the test-case :(( the problem was that I ran the
> benchmarks with a small mistake in my test case ... 
> 
> Loading the HEADER demonstrates how to load flattened pre-formatted PDF
> part templates ...
> 
> Loading the FOOTER demonstrates how to load PDF part templates containing
> fields  that need to be populated.
> 
> The mistake was to leave fixed the HEADER always ... so it would load only
> the flattened PDF template and not the footer (see below) [sigh] In any
> case is good to know that loading flattened PDF parts is cheaper.
> 
> I mistakenly ran the last benchmark like this:
> 
> private static byte[] file2ByteArray(String filePath) throws Exception {
>   InputStream input = null;   
>   ByteArrayOutputStream output = null;
>   try {
> input = new BufferedInputStream(new FileInputStream(HEADER_PATH));
> output = new ByteArrayOutputStream();
> int data = input.read();
> while (data != -1) {
>   output.write(data);
>   
>   data = input.read();
> }
>   
> return output.toByteArray();
>   }   
>   finally {
> if (input != null) {
>   input.close();
> }
>   
> if (output != null) {
>   output.close();
> }
>   }
> }
> --
> 
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28346146.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

One thing occurs to me on the IO performance...  If we are using a memory
mapped file, the backing buffer is, by definition, on the native side of the
virtual/native boundary.  So readying one byte at a time requires a lot of
round trip across that boundary.  Even with memory mapping, it may make
sense to do some sort of paging...  I'll have to think on that a bit.

- K



Giovanni Azua-2 wrote:
> 
> Hello trumpetinc,
> 
> On Apr 23, 2010, at 7:29 PM, trumpetinc wrote:
> 
>> Giovanni - if your source PDFs are small enough, you might want to try
>> this,
>> just to get a feel for the impact that IO blocking is having on your
>> results
>> (read entire PDF into byte[] and use PdfReader(byte[]))
>> 
> Trying it right now ...
> 
>> The StringBuffer could definitely be replaced with a StringBuilder, and
>> it
>> could be re-used instead of re-allocating for each call to nextTokeen()
>> 
> This is what I applied yesterday with the patch I posted. It includes both
> changes in PRTokeniser: StringBuilder + reusing the same instances ... the
> improvement is somewhere around 6.2% faster for my test case. 
> 
> I want to try this one you suggest above ... and then I will post the new
> numbers plus the cumulative patch I have ...
> 
> Best regards,
> Giovanni
> --
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28345102.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

Parsing PDF requires a lot of random access.  It tends to be chunked - move
to a particular offset in the file, then parse as a stream (this is why
paging makes sense, and why memory mapping is effective until the file gets
too big).  But the parsing is incredibly complex.  You can have nested
object structures, lots of alternative representations for the same type of
data, etc... 

And we definitely don't know size of any of these structures ahead of time.


hmmm - just had a thought on IO performance.  I'll post that in a separate
message so we can keep the technical discussion separate.

- K


Mike Marchywka-2 wrote:
> 
> 
> You can have alt implementations in the mean time if you know
> size a priori. Ideally you would
> like to be able to operate on a stream and scrap random access.
>  
> 
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28345099.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

Nah - I'm not saying that memory is cheap (or that cache misses aren't
important) - just saying that int -> char casting isn't the culprit here. 
The parser is a really low level algorithm that is responsible for reading
int from the bytes of a file and figuring out the appropriate value to
convert them to.  Sometimes it's a char, sometimes not.  By the time the
results of the parse are pulled from the parser, they are not ints anymore. 
It's not like we are carrying around a massive block of int[] to represent a
string or anything like that.

The profiling results from Giovanni indicate that the call to isWhitespace
accounts for less than 1% of runtime, while the calls to
RandomAccessForOrArray.read() (and it's Mapped IO derivatives) and
PRTokeniser.nextToken() consume 17% combined.  It's probably best to focus
on those.


Mike Marchywka-2 wrote:
> 
> 
> 
> 
>  
> So you are doing everything internally with 32 bit "chars?"
> Not a big deal but if these are mostly zero there may be 
> better ways to represent and save memory. You may say, "well
> RAM is cheap" but that doesn't matter since low level caches
> are fixed but I guess you can get a bigger disk and say VM is unlimited.
>  
>  
>  
>  
>> are actually consuming run time, and this method isn't one of them (no
>> matter how much it could be optimized).
> 
> The only person with data claimed otherwise :)
>  
>  
>  
>>
>>
> tp://1t3xt.info/tutorials/keywords/
> 
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28344478.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

This tells us that the focus needs to be on PRTokeniser and RAFOA.  For what
it's worth, these have also shown up as the bottlenecks in profiling I've
done in the parser package (not too surprising).

I'll discuss each in turn:

RandomAccessFileOrArray - I've done a lot of thinking on this one in the
past.  This is an interesting class that can have massive impact on
performance, depending on how you use it.  For example, if you load your
source PDF entirely into memory, and pass the bytes into RAFOA, it will
remove IO bottlenecks.  

Giovanni - if your source PDFs are small enough, you might want to try this,
just to get a feel for the impact that IO blocking is having on your results
(read entire PDF into byte[] and use PdfReader(byte[]))


The next thing that I looked at was buffering (a naive use of
RandomAccessFile is horrible for performance, and significant gains can be
had by implementing a paging strategy).  I actually implemented a paging
RandomAccessFile and started work on rolling it into RAFOA last year, but my
benchmarks showed that the memory mapped strategy that RAFOA uses had
equivalent performance to the paging RAF implementation.

These tests weren't conclusive, so there may still be some things to learn
in this area.

The one problem with the memory mapped strategy (in it's current
implementation) is that really big source PDFs still can't be loaded into
memory.  This could be addressed by using a paging strategy on the mapped
portion of the file - probably keep 10 or 15 mapped regions in an MRU cache
(maybe 1MB in size each).

For reference, the ugly (really ugly) hack that determines whether RAFOA
will use memory mapped IO is the Document.plainRandomAccess static public
variable (shudder).




So what about the code paths in PRTokeniser.nextToken()?

We've got a number of tight loops reading individual characters from the
RAFOA.  If the backing source isn't buffered, this would be a problem, but I
don't know that is really the issue here (it would be worth measuring
though...)

The StringBuffer could definitely be replaced with a StringBuilder, and it
could be re-used instead of re-allocating for each call to nextTokeen()
(this would probably help quite a bit, as I'll bet the default size of the
backing buffer has to keep growing during dictionary reads).

Another thing that could make a difference is ordering of the case and if's
- for example, the default: branch turns around and does a check for (ch ==
'-' || ch == '+' || ch == '.' || (ch >= '0' && ch <= '9').  Changing this to
be:

case '-':
case '+':
case '.':
case '0':
...
case '9':

May be better.


The loops that check for while (ch != -1 && ((ch >= '0' && ch <= '9') || ch
== '.')) could also probably be optimized by removing the && ch != -1 check
- the other conditions ensure that the loop will escape if ch==-1


It might be interesting to break the top level parsing branches into
separate functions so the profiler tell us which of these main branches is
consuming the bulk of the run time.


Those are the obvious low hanging fruit that I see.

Final point:  I've seen some comments suggesting inlining of some code. 
Modern VMs are quite good at doing this sort of inlining automatically - a
test would be advisable before worrying about it too much.  Having things
split out actually makes it easier to use a profiler to determine where the
bottleneck is.


One thing that is quite clear here is that we need to have some sort of
benchmark that we can use for evaluation - for example, if I had a good
benchmark test, I would have just tried the ideas above to see how they
fared.

- K


Giovanni Azua-2 wrote:
> 
> 
> On Apr 22, 2010, at 11:18 PM, trumpetinc wrote:
>> 
>> I like your approach!  A simple if (ch > 32) return false; at the very
>> top
>> would give the most bang for the least effort (if you do go the bitmask
>> route, be sure to include unit tests!).
> 
> 
> Doing this change spares approximately two seconds out of the full
> workload so now shows 8s instead of 10s and isWhitespace stays at 1%.
> 
> The numbers below include two extra changes: the one from trumpetinc above
> and migrating all StringBuffer references to use instead StringBuilder.
> 
> The top are now:
> 
> PRTokeniser.nextToken  8%   77s 19'268'000 
> invocations
> RandomAccessFileOrArray.read   6%   53s   149'047'680 invocations
> MappedRandomAccessFile.read  3%   26s  61'065'680 invocations
> PdfReader.removeUnusedCode   1%  15s 6000 invocations
> PdfEncodings.convertToBytes   1%   15s5'296'207 invocations
> PRTokeniser.nextVa

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc

Yes - it needs to be int.  Regardless, we need to focus on the things that
are actually consuming run time, and this method isn't one of them (no
matter how much it could be optimized).


Mike Marchywka-2 wrote:
> 
> 
> 
> 
> does this have to be int vs char or byte? I think earlier I suggested
> operating on byte[] instead of making a bunch of temp strings
> but I don't know the context well enough to know if this makes sense.
> Certainly demorgan can help but casts and calls are not free either.
>  
> Also, maybe hotspot runtime has gotten better but I have found in
> the past that look up tables can quickly become competititve
> with bit operators ( if your param is byte instead of int, a
> 256 entry table can tell you if the byte is a member of which classes). 
>  
> 
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28343789.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-22 Thread trumpetinc

The semantics are different (the JSE call includes more characters in it's
definition of whitespace than the PDF spec).  Not saying that it can't be
easily done, but throwing an if statement at it and seeing what impact it
has on performance is pretty easy also.

What was the overall time %age spent in this call in your tests?


Giovanni Azua-2 wrote:
> 
> Hello,
> 
> On Apr 22, 2010, at 10:59 PM, Giovanni Azua wrote:
>> PRTokeniser.isWhitespace is a simple boolean condition that just happen
>> to be called gazillion times e.g. 35'622'000 times for my test workload
>> ... if instead of doing it like:
>> 
>> public static final boolean isWhitespace(int ch) {
>> return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch
>> == 32);
>> } 
>> 
>> we used a bitwise binary operator with the appropriate mask(s), there
>> could be some good performance gain ... 
>> 
> The function already exists in
> http://java.sun.com/javase/6/docs/api/java/lang/Character.html#isWhitespace%28char%29
> I checked and it already uses bitwise binary operators with the right
> masks ... we would only need to inline it to avoid the function call
> costs.
> 
> Best regards,
> Giovanni
> --
> 
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28334828.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] performance follow up

2010-04-22 Thread trumpetinc

I like your approach!  A simple if (ch > 32) return false; at the very top
would give the most bang for the least effort (if you do go the bitmask
route, be sure to include unit tests!).

I know there were a lot of calls to this method, but I'm curious:  in your
pofiling, how much of the total processing _time_ was spent in that routine? 
The if() would make this 6 times faster, but it's hard to believe that this
call has any appreciable contribution to runtime.

Keep it up!

- K


Giovanni Azua-2 wrote:
> 
> 
> Now only 23.8% to go. We only need to make 4 more fixes like the last one
> and the gap will be gone :) The Profiler shows there are still several
> bottlenecks topping which could also be easy fixes e.g.
> PRTokeniser.isWhitespace is a simple boolean condition that just happen to
> be called gazillion times e.g. 35'622'000 times for my test workload ...
> if instead of doing it like:
> 
> public static final boolean isWhitespace(int ch) {
> return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch
> == 32);
> } 
> 
> we used a bitwise binary operator with the appropriate mask(s), there
> could be some good performance gain ... 
> 
> Best regards,
> Giovanni
> --
> 
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28334733.html
Sent from the iText - General mailing list archive at Nabble.com.


--
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Low level browsing of document structure

2010-03-30 Thread trumpetinc

Look at the parser package (com.itextpdf.text.pdf.parser) - you can start
with the PdfContentReaderTool as a starting point.  I think you'll find that
this will greatly simplify your efforts.

Only caveat:  I don't know if the parser has been ported to iTextSharp yet.

- K


Mircea Zahan wrote:
> 
> Hi all,
> 
> Everything is just peachy with iText when one
> only wants to write PDFs. But when it comes to
> reading, the documentation says almost nothing.
> Only basics, like metadata, pages etc.
> 
> My problem: I need to obtain all the lines, curves
> etc. from a PDF together with their companions, that
> is, transformation matrixes, colors etc. In short, all
> the graphic content.
> 
> I have read everything that I could get my hands
> on and couldn't find a single example of how
> that can be achieved.
> 
> Anyone knows how to do that?
> 
> 
> I also need to get the internal ID of an object,
> the one looking like: 20 R, 30 R etc. That also I
> couldn't figure out and didn't find it anywhere.
> Any luck with it?
> 
> 
> Most grateful,
> Mircea.
> 
> --
> Download Intel® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Low-level-browsing-of-document-structure-tp28078245p28084870.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-20 Thread trumpetinc

I went through *exactly* the same source review last month when I started
hitting it.  I was quite happy to see that iText actually sets the rotation
in the page dictionary (I needed it for constructing unit tests), but I
agree that the way it happens is a bit hard to follow.  If you think about
it from the way the PDF spec is written, though, you could see how this
implementation evolved:

There is a dictionary entry for the unrotated page size, plus the rotation
entry.  The rectangle that the user has been working with has been in
rotated coordinates, so to set the unrotated page size, they have to
unrotate.

On the content side, iText rotates the coordinate system just to make life
easier for the user.  And PDF has a complete disassociation between rotation
at the page level, and the required rotation at the content level. 
Confusing as all get out.

In my parsing code, I am currently ignoring the page rotation entirely, so
rotated pages wind up with text alignment being off by 90 degrees (generally
speaking, not an issue for text extraction because all of the text rotates -
but for rendering or geometric filtering, it is an issue).  At some point,
I'll have to address this - probably by applying yet another CTM when
computing user space coordinates.  Oy.

- Kevin


Mark Storer-2 wrote:
> 
> Looking through the Rectangle.rotate() -> Pdf-structures-in-the-output
> code, I think we might have An Issue.  Woah woah woah... let me check the
> trunk instead of my red-headed-stepchild-branch of 2.0.whatever-it-was.
> 
> Rectangle.rotate() { // yep, no changes
>   Rectangle rect = new Rectangle(lly, llx, ury, urx);
>   rect.rotation = rotation + 90;
>   rect.rotation %= 360;
>   return rect;
> }
> 
> It swaps the x's and ys, and sets the rotation member.
> 
> In PdfDocument.newPage(), we find The Following Code:
> 
> // [U1] page size and rotation
> int rotation = pageSize.getRotation();
> ...
> PdfPage page = new PdfPage(new PdfRectangle(pageSize, rotation),
> thisBoxSize, resources, rotation);
> 
> 
> So rotation gets passed to the new PdfRectangle and to the new PdfPage:
> 
> public PdfRectangle(float llx, float lly, float urx, float ury, int
> rotation) {
> super();
> if (rotation == 90 || rotation == 270) {
> this.llx = lly;
> this.lly = llx;
> this.urx = ury;
> this.ury = urx;
> }
> else {
> this.llx = llx;
> this.lly = lly;
> this.urx = urx;
> this.ury = ury;
> }
> super.add(new PdfNumber(this.llx));
> super.add(new PdfNumber(this.lly));
> super.add(new PdfNumber(this.urx));
> super.add(new PdfNumber(this.ury));
> }
> 
> 
> PdfRectangle swaps the coordinates *again*.  It doesn't store the rotation
> value, just makes use of it.
> 
> And...
> PdfPage(PdfRectangle mediaBox, HashMap boxSize,
> PdfDictionary resources, int rotate) {
> super(PAGE);
> this.mediaBox = mediaBox;
> put(PdfName.MEDIABOX, mediaBox);
> put(PdfName.RESOURCES, resources); 
> if (rotate != 0) {
> put(PdfName.ROTATE, new PdfNumber(rotate));  // *** This is the
> only place its used ***
> }
> for (int k = 0; k < boxStrings.length; ++k) {
> PdfObject rect = boxSize.get(boxStrings[k]);
> if (rect != null)
> put(boxNames[k], rect);
> }
> }
> 
> 
> 
> So we've swapped it back, and stored the value in the PdfDictionary for
> the page ONLY.  It's not retrieved anywhere in PdfPage...
> 
> Ah!  In PdfContent, the rotation is taken from the original Rectangle
> again and used in a transformation matrix, just like Kev(in?) said.
> 
> So while swapping the rectangle coordinates twice is certainly ODD, it
> doesn't look like there's anything genuinely broken in there... just an
> "Even Number of Sign Errors".  Those are fine as long as you find both of
> them.  Finding one and having the "correct" output anyway is a bit
> maddening.  *twitch*
> 
> --Mark Storer 
>   Senior Software Engineer 
>   Cardiff.com
> 
> #include  
> typedef std::Disclaimer DisCard; 
> 
> 
> 
>> -Original Message-
>> From: trumpetinc [mailto:forum_...@trumpetinc.com]
>> Sent: Tuesday, January 19, 2010 9:00 PM
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Rotate Page After Adding Text 
>> to Document
>> 
>> 
>> 
>> As a point of clarification, I'm pretty sure that, in 
>> addition to swapping
>> width and height, rotate() signals PdfDocument to add a 
>> rotation cm entry to
>> the beginning of the content stream, and adjusts 

Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-19 Thread trumpetinc

As a point of clarification, I'm pretty sure that, in addition to swapping
width and height, rotate() signals PdfDocument to add a rotation cm entry to
the beginning of the content stream, and adjusts the rotation dictionary
entry for the page.

And I completely agree with the 'messy code for dealing with it' comment. 
As an example, ImportedPage doesn't preserve the page rotation from the
source, which can cause all sorts of mayhem (esp because the page rotation
implies an awkward change in origin).

- K


Mark Storer-2 wrote:
> 
> Ah.  So you don't want to spin-the-pages-contents-sideways, you want
> landscape-vs-portrait.
>  
> "Rotation" isn't the word you want.  You just want to change the page size
> from 8.5x11 to 11x8.5.  By the way, Rectangle.rotate() doesn't really
> rotate the page either, it swaps the width/height.  In PDF, there's a
> concept of page rotation indepentant of a page's physical dimensions,
> which can lead to all manner of Interesting Confusion (and messy code for
> dealing with it).
>  
> 

-- 
View this message in context: 
http://old.nabble.com/Rotate-Page-After-Adding-Text-to-Document-tp27234067p27236838.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-19 Thread trumpetinc

Random thought (and more of a mental exercise than a real solution):  I
wonder if it's possible to insert an object reference for the parameters to
a cm operation in a content stream...

I know, for example, that it's possible to do this with text operations, so
I'd imagine that it's possible with other operations.

I'll leave the arduous task of actually doing such a thing as an exercise to
the reader ;-)


That won't, of course, take care of re-flowing the text (if that was
desired), but I don't think that stamping the content onto rotated pages
will do that either.


-- 
View this message in context: 
http://old.nabble.com/Rotate-Page-After-Adding-Text-to-Document-tp27234067p27234542.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] Unit Testing, Stress Testing, Profiling...

2010-01-16 Thread trumpetinc

For what it's worth, I've been able to create some pretty good content based
unit tests using the parser...  I have a filtering parser that I've put
together (haven't committed it yet) that allows you to specify a region of
the page to extract text from.  This makes it pretty easy to determine if
text was placed in the correct location.

It won't work for everything, of course, but for functionality related to
layout, etc... this may be useful.

Cheers,

- K


Paulo Soares-3 wrote:
> 
> As Mark said units tests for PDF are virtually impossible because it's
> extremely difficult to verify that a PDF is correct other than by opening
> and looking at it.
> 
> Paulo
> 
>> -Original Message-
>> From: Mark Storer [mailto:msto...@autonomy.com] 
>> Sent: Thursday, January 14, 2010 5:15 PM
>> To: Post all your questions about iText here
>> Subject: Re: [iText-questions] Unit Testing, Stress Testing, 
>> Profiling...
>> 
>> All the unit tests are available in the source downloads at 
>> http://sourceforge.net/projects/itext/files/ .  You can also 
>> get the trunk from SVN at 
>> 
>> I don't believe there's anything in the way of performance 
>> testing in there, just the basic "yes: it ran, no: it didn't 
>> explode, yes: there's an output file" stuff.  GOOD unit tests 
>> for PDF are Very Hard.
>> 
>> 
>> 
>> --Mark Storer 
>>   Senior Software Engineer 
>>   Cardiff.com
>> 
>> #include  
>> typedef std::Disclaimer DisCard; 
>> 
>> 
>> 
>> > -Original Message-
>> > From: Ghady Diab [mailto:ghady.d...@live.com]
>> > Sent: Wednesday, January 13, 2010 11:54 AM
>> > To: itext-questions@lists.sourceforge.net
>> > Subject: Re: [iText-questions] Unit Testing, Stress Testing,
>> > Profiling...
>> > 
>> > 
>> > Hey,
>> > 
>> > It's Ghady DIAB from Lebanon. I'm really interested in your 
>> > iTextSharp 
>> > library (C#), and I'm working on a small project using it for 
>> > my university.
>> > 
>> > Is there a way I can get the Unit Tests you did for this 
>> > library as well as 
>> > stress testing and profiling documents (results).
>> > 
>> > If these documents are not available for free, I'll pay. Just 
>> > let me know if 
>> > they're available and how can I access them.
>> > 
>> > Thanks in advance.
>> > 
>> > Respectfully,
>> > Ghady DIAB
>> > --
>> > From: "Bruno Lowagie" 
>> > Sent: Wednesday, January 13, 2010 9:20 PM
>> > To: "Ghady Diab" 
>> > Cc: 
>> > Subject: Re: Unit Testing, Stress Testing, Profiling...
>> > 
>> > > Ghady Diab wrote:
>> > >> Hey,
>> > >>  It's Ghady DIAB from Lebanon. I'm really interested in 
>> > your iTextSharp 
>> > >> library (C#), and I'm working on a small project using it for my 
>> > >> university.
>> > >>  Is there a way I can get the Unit Tests you did for this 
>> > library as well 
>> > >> as stress testing and profiling documents (results).
>> > >>  If these documents are not available for free, I'll pay. 
>> > Just let me 
>> > >> know if they're available and how can I access them.
>> > >
>> > > That's not really a sales question. The people who write 
>> > unit tests are on 
>> > > the mailing list (Xavier Le Vourch, Kevin Day); you should 
>> > post your 
>> > > question there. The address is 
>> > itext-questions@lists.sourceforge.net but 
>> > > you should register first as I'm the only one who can 
>> > approve questions, 
>> > > and I'm teaching iText in Paris the next two days (meaning: 
>> > I'll probably 
>> > > won't be online much).
>> > > best regards,
>> > > Bruno
> 
> 
> Aviso Legal:
> 
> Esta mensagem é destinada exclusivamente ao destinatário. Pode conter
> informação confidencial ou legalmente protegida. A incorrecta transmissão
> desta mensagem não significa a perca de confidencialidade. Se esta
> mensagem for recebida por engano, por favor envie-a de volta para o
> remetente e apague-a do seu sistema de imediato. É proibido a qualquer
> pessoa que não o destinatário de usar, revelar ou distribuir qualquer
> parte desta mensagem. 
> 
> 
> 
> Disclaimer:
> 
> This message is destined exclusively to the intended receiver. It may
> contain confidential or legally protected information. The incorrect
> transmission of this message does not mean the loss of its
> confidentiality. If this message is received by mistake, please send it
> back to the sender and delete it from your system immediately. It is
> forbidden to any person who is not the intended receiver to use,
> distribute or copy any part of this message.
> 
> 
> 
> 
> --
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for
> Conference
> attendees to learn about information security's most important issues
> through
> interactions with peers, luminaries and emerging and established
> companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _

Re: [iText-questions] Unit Testing, Stress Testing, Profiling...

2010-01-14 Thread trumpetinc

For what it's worth, I've been able to create some pretty good content based
unit tests using the parser...  I have a filtering parser that I've put
together (haven't committed it yet) that allows you to specify a region of
the page to extract text from.  This makes it pretty easy to determine if
text was placed in the correct location.

It won't work for everything, of course, but for functionality related to
layout, etc... this may be useful.

Cheers,

- K 

-- 
View this message in context: 
http://old.nabble.com/Re%3A-Unit-Testing%2C-Stress-Testing%2C-Profiling...-tp27156414p27167317.html
Sent from the iText - General mailing list archive at Nabble.com.


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] How do I get the position of an image in a PDF file?

2009-12-21 Thread trumpetinc

I just committed some new (highly experimental) code to SVN (rev 4221).

See com.itextpdf.text.pdf.parser.ImageRenderListener

To see an example of registering a RenderListener with a PdfContentParser,
see PdfTextExtractor.getTextFromPage() in the same package (you'll pass your
own RenderListener in to the PdfContentStreamProcessor constructor).

The image part of this is all highly experimental and not at all well tested
(in fact, during your travels, if you happen to come up with some unit
tests, including smallish PDF files if necessary, be sure to let me know). 
Most of my effort right now is focused on improving text parsing, but if you
find things going on with the image side, let me know.

I'm also very open to suggestions for architectural changes related to the
structure of the render listeners (right now, the text and image listeners
have been intentionally kept separate - maybe that is good, maybe not).

I look forward to your feedback,

- K

On Fri, Nov 20, 2009 at 2:29 PM, trumpetinc2 
wrote:
[...]

>
> I can put something together if you are interested in testing it out and
> providing feedback

Sure, I'd like to give it a try.

Thanks Larry


-- 
View this message in context: 
http://old.nabble.com/How-do-I-get-the-position-of-an-image-in-a-PDF-file--tp26417166p26884194.html
Sent from the iText - General mailing list archive at Nabble.com.


--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/