On 17/06/2007 2:27 AM, Huge Mountain wrote:
> Thank all you!
> 
> I still have another question:
> I'm trying to get the binary format of MS Excel file with this 
> book, excelfileformat.pdf from 
> http://sc.openoffice.org/excelfileformat.pdf 
> <http://sc.openoffice.org/excelfileformat.pdf>

As previously advised, you also need to read compdocfileformat.pdf -- 
see below.

> In part 2.2.2, page 11, we have:
> The following table lists names of possible streams.
> Stream name :
> Book:                                                  
> BIFF5/BIFF7workbook stream (➜5.1.3)
> Workbook:                                         BIFF8 workbook stream 
> (➜5.1.3)
> <05H>SummaryInformation:                  Document settings
> <05H>DocumentSummaryInformation:   Document settings
> Ctls:                                                    Formatting of 
> form controls
> User Names                                         User names in shared 
> workbooks (➜10)
> Revision Log                                         Change tracking log 
> stream (➜10)
> 
> I just care about BIFF8/8X. Is this all streams that exist in BIFF8, and 
> no more stream used? 

There can be streams containing macros etc.

> (In fact that my excel file, which I told before, 
> only have 3 streams: Workbook, <05H>SummaryInformation, 
> <05H>DocumentSummaryInformation). Is all content of a excel file 
> (content of cells) stored in Workbook stream?

All cell content is in the Workbook stream, except of course when an 
external reference is made (to cells in another file).

> Is length of Workbook 
> stream unlimited

A worksheet can contain up to 256 columns and 65536 rows, and I am 
unaware of any limit on the number of worksheets. My pathology museum 
includes a 120 Mb program-created file and a 40 Mb manually-created 
file. A practical limit would be imposed by the amount of memory 
required to process the file.

> and the usual value, 4096 bytes, just a min length of 
> Workbook stream?

There is no minimum size other than as dictated by the required 
contents. 4096 is the usual value for the minimum size of a *standard* 
stream. Streams smaller than that are either (a) included in the 
Short-Stream Container Stream ("SSCS") or (b) zero-padded to 4096 bytes 
and written as a standard stream. A reader must compare the stream's 
size (from its directory entry) with the minimum size of a standard 
stream and act accordingly. I have just created a small XLS file using 
Gnumeric 1.7.6 (it has 'x' in cell A1 of the sole worksheet); the size 
of the Workbook stream is 2057 bytes and this is included in the SSCS 
(as are the two .*SummaryInformation streams).

> Please help me to know more clearly!

I am very curious about the direction that you are taking -- you appear 
to want to write an XLS reader yourself, based on an extremely cursory 
reading of some of the available documentation and a sample of one tiny 
XLS file. You have been pointed at existing working implementations in 
C, Python, perl and Java, but appear not to be interested in those, not 
even reading their source code. Must you re-invent the wheel? What 
language are you going to use? If C++, consider (1) using the Gnumeric C 
source (2) digging in the source of OpenOffice.org's Calc. You'd no 
doubt find implementations in PHP, Delphi, etc if you google hard enough.

Using existing implementations can be trivially easy; here I demonstrate 
  digging that 'x' out of the tiny Gnumeric-created XLS file using 
Python interactively:

| >>> import xlrd
| >>> book = xlrd.open_workbook('c:/excel_misc/1cell_gnu.xls')
| >>> sheet = book.sheet_by_index(0)
| >>> sheet.name
| u'Sheet1'
The u means Unicode.
| >>> sheet.ncols
| 1
| >>> sheet.nrows
| 1
| >>> sheet.row_values(0)
| [u'x']
| >>> sheet.row_values(0)[0]
| u'x'


HTH,
John
_______________________________________________
gnumeric-list mailing list
gnumeric-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnumeric-list

Reply via email to