Sorry I hit ctrl-enter instead of enter
On Sun, 2002-03-03 at 22:37, Said Ackley wrote:
> Andy,
>
> Thanks for the heads up. Just a warning, this is a very long e-mail. Sorry
> to ramble on : )
>
> I have been wondering what direction I should take. Right now I've been
> working on transferring the low level functionality from WordDocument to
> HDFObjectFactory. I've been using FileInformationBlock. I also wanted to
> bring all of my data types to the hdftypes package.
>
> I had a couple of ideas about writing to a Word document I wanted to share
> with you and everybody else and see what you think. I think that instead of
> creating a new Word document from scratch, we should use a blank document as
> a template like Word does (normal.dot) or any template a user wants for that
> matter. I think this is the right way to go for the following reasons.
>
+1 - Since that is the way word does it.
> -It would be MUCH less error prone. If we don't have some obscure structure
> in the word file it could blow up. You mentioned you guys have experienced
> this firsthand with HSSF. There are at least half a dozen structures
> referenced in the FIB that I can't find any documentation on whatsoever. If
> we use a template our whole file structure is already there and we only have
> to insert the needed stuff and tweak the necessary structure fields.
>
> -It would make development a much simpler and enjoyable process and also a
> lot faster.
>
-1 we still need to learn how to read/write these fields and document
what they mean. The point behind POI is to be intellectually
challenging. We're implementing convoluted file formats in a language
with poor file i/o (etc).
> -This is how Word creates a new blank document.
>
+1
> We would still have to have field_x for every field in a structure up to a
> point. For example, right now, in the FIB, the highest field number I need
> is like 93. We wouldn't need to get anything after the highest we need
> because then we could just load the first 93 fields from the template,
> change what we need in memory, then write to the first 93 fields in the
> actual file. It may not be this simple in practice because of the nature of
> the OLE2 file format, but you get the idea.
>
-1 - we still need to learn these and implement them. don't worry I
think Glen has sold me on generation ;-)
> Another thing I wanted to point out was that there are many data structures
> that are compressed on file (CHP, PAP, SEP, TAP) they only exist
> uncompressed in memory. I added every field to these structures anyway becau
> se I wasn't sure what was what. My point is these will be a snap to create,
> because on file they are simply deltas from a style in the stylesheet. Oh
> yeah, and the template would contain the default styles so we wouldn't have
> to create these from scratch.
>
-1 We need to learn how to set these in order to properly modify/etc the
file.
> In conclusion, if we use templates we can have writing to and creating Word
> document done in no time.
>
+1
> The following is an outline of the algorithm I would use to write text to a
> Word document (Its been awhile so this may not be completely accurate ;-)
>
> 1) Add or insert the text in the area specified between FIB.fcMin and
> FIB.fcMac. adjust FIB.fcMac, adjust FIB.ccpText
>
+1
> 2) Change the text piece in the text piece table to include the new offset
> range or add a new piece.
>
+1
> 3) Insert the appropriate properties (Character, Paragraph, Section, in
> compressed form) in the appropriate FKP(s). Adjust text offset values for
> the properties that follow the inserted properties.
>
+1
> Thats it I think. Thats how simple it would be (I hope).
>
-1 - It won't be - I promise. The point is to provide us with
intellectual challenge. Originally the project was started because I
was bored just doing easy ol' webapps. (which is about as technically
challenging as operating toenail clippers BTW - you all can quote me on
that -- not saying it can't be fun but its not challenging). Trust me,
if it were that easy, there would be lots of POI projects around that
had done it before. There were only like 2 or 3 commercial projects
that had done the Excel thing before (and none were under like $5K).
Judging from this and some of the other information we've seen - Word is
harder. So lets template some things, but our goal should be to *get it
right*. Lets get something that is functional or *semi-functional* asap
but again our goal should be to get it correct and document it. This is
how the project becomes way more important then itself ("the cause"), if
we crack these file formats wide open -- it can make a big difference in
the years to come. So I'm not opposed to using the template method
(though I'm not sure we shouldn't do it in class files rather then doc
files) -- but we want to create a More correct implementation of M$'s
file formats then even M$ has! So we still need to *get it right* no
shortcutting around that. Quality is Job #1 (cept when we roll our SUVs
over people and blame it all on the tires and not our marketing plan of
pushing trucks with a high center of gravity on families and
inexperienced drivers while making the problem worse in every model).
-Andy
> Ryan
>
>
>
> ----- Original Message -----
> From: "Andrew C. Oliver" <[EMAIL PROTECTED]>
> To: "Said Ackley" <[EMAIL PROTECTED]>
> Cc: "POI Development" <[EMAIL PROTECTED]>
> Sent: Sunday, March 03, 2002 1:08 PM
> Subject: refactoring
>
>
> > Hi Ryan,
> >
> > I've been mostly alternating between learning about the Word format,
> > reading through your code and attempting to tap out the FIB (sadly I got
> > 1/4 of the way through it before a corrupt copy of xscreensaver crashed
> > me out.
> >
> > So I started thinking... gee Ryan's out there in space (no pun intended
> > ;-) ), he just donated some code and there I am doing things to it
> > without explaining anything and he's wondering "gee what do I do now?"
> > and wondering what a big jerk I am ;-).
> >
> > So I thought I'd check in and try and involve anyone else interested in
> > the development of HDF.
> >
> > Take a look at the Vision for 2.0... that's the general plan. I'd like
> > to get us to where we can create an abstract low level and high level
> > data model for HDF. Next release we'll work on integrating into Cocoon
> > or maybe FOP (the XML parts), or maybe if Ken is super bored one day he
> > might do that for us ;-).
> >
> > In reading the word document you have the luxury of skipping all of the
> > useless fields, we'll need to pay attention to those fields in order to
> > write the word documents out. So I'm working on creating "types" for
> > each of those binary structures based on your code and any other
> > information I find.
> >
> > We can do this a number of ways, I'm just currently typing out the
> > private member variables, from there we'll need to create getters and
> > setters.
> >
> > Glen has a "record generator" he developed for HSSF. If he's willing to
> > give us a hand, maybe we should consider using it or adapting it to let
> > us describe via XML and generate these structures.
> >
> > Once we get the low level structures, the first goal should be to read
> > in a very simple doc and write it back out. Once we get there, start
> > working on a basic high level API (using familiar document objects like
> > Document, Page, Paragraph etc etc rather then those nasty four letter
> > abbreviations).
> >
> > How do you feel about what I've started on (other than not being so sure
> > you want to type field_x_ before everything)? What do you want to do?
> > Is there anything you don't like?
> >
> > Would anyone out there like to lend a hand? At the moment, the biggest
> > thing we need it to create the types so we can start trying to fill them
> > and create something like org.apache.poi.hssf.dev.biffviewer to debug
> > our understanding..
> >
> > Any thoughts from anyone? Am I leaving anyone out?
> >
> > Thanks,
> >
> > Andy
> > --
> > http://www.superlinksoftware.com
> > http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document
> > format to java
> > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> > - fix java generics!
> > The avalanche has already started. It is too late for the pebbles to
> > vote.
> > -Ambassador Kosh
> >
>
--
http://www.superlinksoftware.com
http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document
format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
- fix java generics!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh