Hi,

Am 19.04.2013 00:50, schrieb Maruan Sahyoun:
I think there are different levels to think about which are interwoven

a) which use cases do we support - parsing, text extraction, merging, form 
filling, viewing, creation …. - do we need more? can we drop some?
Perhaps the creation of pdfs using a higher lever api, but that's nothing
for the near future

b) do we have a good architecture to support these use cases
There are already some discussions about refactoring/reimplementing. I guess we
will find a lot of things which can be done better. We'll see ...

c) how do we organize the major parts - I think there is already a feeling that 
pdfbox should be modularized in one way or the other
That's a main topic

d) which dependencies do we have and where (might belong to b) - e.g. is it a 
good idea that PDDocument needs awt? So where are the boundaries from byte/file 
level to COS to PD model to app/tools/utilities …
Some of those dependencies have to be moved/removed when creating modules

e) which PDF functionality is missing e.g. do we need to have a better support 
for different PDF versions
There is a lot missing, but we'll see what's important (to us or any
contributor).

f) efficiency, memory consumption e.g. do we need something like lazy loading
It's always a good idea to tzhink about those topics :-)

g) as Thomas wrote type safety, generics …. - maybe better object orientation 
e.g. today some parsing is done in the parsers, some is done in the COS objects 
(COSString)
I'd like to remove all deprecated stuff as well

…….

which is the API we agree to keep stable. Is it COS… , PD …..

Thinking about these 'levels' doesn't mean that we do have to address all of 
these immediately (or at all) but it will help to set the expectations.


My initial thoughts are

2.0
o get the API levels right byte/token -> COS -> PD -> Utils/awt/.…. -> apps 
(Debugger, Reader… - will we keep all of them?)
o "guarantee" PD level for 2.x -> that's our API which means we can freely 
change everything above and below in the 2.x branch. Document that!
o type safety, collections  ….  on PD level first

2.1
o improved parser
o improved object model below PD e.g. decide if parsing from tokens to COS is 
done in parser or COS object but not mixed
o more type safety
…..

2.2
o improved writer
o new "incremental" writer/stamper
o handling of non WinAnsi
I hope to get that done sooner. I already have an working prototype...

BR
Maruan

BR
Andreas Lehmkühler

Reply via email to