Re: [iText-questions] RTF Parser update

Howard Shank Wed, 05 Dec 2007 06:37:36 -0800

Hi Paulo,

Thanks for the comments. I am always looking for ways to improve my code.


With regards to the parser overall, initial testing on my system processed a 
9.5mb RTF file in ~29 seconds. Through optimizations, it now only takes ~13 
seconds.

The HashMap is initialized using a value large enough to hold all the keys with 
10% free slots. The initial sizing of the HashMap ensures it doesn't have to 
rebuild itself during initialization, so there's no reallocation of memory 
occuring. The HashMap object consumes ~8k of memory.

Here are the statistics for loading the RtfCtrlWordMgr object which includes 
initializing and loading the HashMap. I ran it with a non-static and static 
HashMap object. All times end up approximately the same on my system.
=Non Static HashMap========================
RtfCtrlWordMgr start date: Dec 5, 2007 9:31:09 AM
RtfCtrlWordMgr end date  : Dec 5, 2007 9:31:09 AM
  Elapsed time    : 141 milliseconds.
Begin Constructor RtfCtrlWordMgr , free mem is 1,321k
End Constructor RtfCtrlWordMgr , free mem is 1,166k
RtfCtrlWordMgr used approximately 155k
========================================
=Static HashMap===========================
RtfCtrlWordMgr start date: Dec 5, 2007 9:32:12 AM
RtfCtrlWordMgr end date  : Dec 5, 2007 9:32:12 AM
  Elapsed time    : 157 milliseconds.
Begin Constructor RtfCtrlWordMgr , free mem is 1,324k
End Constructor RtfCtrlWordMgr , free mem is 1,169k
RtfCtrlWordMgr used approximately 155k
=======================================

Ultimately, there may be a way to perform lazy loading of the classes. I'll 
keep that in mind as I work through them.

99% of the class are not implemented at this time so at this time the 
functionality is limited to duplicating the old process. Additionally, there 
are not thousands of elements. If I remember correctly, there are 1810 control 
words.

Each control word does it own special function and some control words perform 
multiple functions depending on the state of the document. Each extended 
control word class will eventually end up doing it's own processing. So I'm not 
quite sure they can be collapsed into a generic class.

I will however continue to look for ways to improve the code!

Howard

----- Original Message ----
From: Paulo Soares <[EMAIL PROTECTED]>
To: Post all your questions about iText here <[EMAIL PROTECTED]>
Sent: Wednesday, December 5, 2007 5:56:38 AM
Subject: Re: [iText-questions] RTF Parser update

I had a very quick look at the thousands of new classes created and I
have a few remarks:

- it looks like 99% of those classes could be eliminated and a generic
class with some parameters be created when loading the hash

- each time a parser is created the hash must be filled with thousands
of elements with only a few being actually used. If it's not possible to
have the hash as a static object at least it could be filled dynamically
as needed, saving time and memory

Paulo

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Howard Shank
> Sent: Tuesday, December 04, 2007 9:48 PM
> To: Post all your questions about iText here
> Subject: [iText-questions] RTF Parser update
> 
> Hello everyone,
> 
> Just a quick note to let everyone know there was a pretty big 
> update the the RtfParser today. Mark Hall was gracious enough 
> to review and accept the changes and added the update to the 
> repository today.
> 
> New parser features:
> Correctly parses all control words, parameters and data.
> Uses BufferedReader for faster processing of input.
> New control word "wiring" architecture allows for easier 
> implementation of control words.
> Control Words defined in this update are from the RTF 
> Specification 1.9. (Does not include some application 
> specific extensions)
> 
> This update includes a rewrite of the parser and lots of new 
> "wiring" for handling the 1800+ RTF control words. The source 
> update size is approximately 12.5mb.
> 
> The import functionality should work exactly as before and 
> does not require any changes to existing code using the 
> RtfWriter2. If you encounter any issues, please post a 
> description of the issue here, with a sample RTF file if 
> possible, and I will follow up as quickly as I can.
> 
> Further enhancements I am working on include:
> Handling info group data. i.e. author, subject, title, etc.
> Handling stylesheet mapping.
> Handling list table mapping.
> and more...
> 
> Regards,
> Howard Shank


Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter 
informação confidencial ou legalmente protegida. A incorrecta transmissão desta 
mensagem não significa a perca de confidencialidade. Se esta mensagem for 
recebida por engano, por favor envie-a de volta para o remetente e apague-a do 
seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de 
usar, revelar ou distribuir qualquer parte desta mensagem. 

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain 
confidential or legally protected information. The incorrect transmission of 
this message does not mean the loss of its confidentiality. If this message is 
received by mistake, please send it back to the sender and delete it from your 
system immediately. It is forbidden to any person who is not the intended 
receiver to use, distribute or copy any part of this message.


      
____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping


-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

Re: [iText-questions] RTF Parser update

Reply via email to