Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-28 Thread Wesley Parish
On Mon, 28 Nov 2005 03:40, mark wrote:
> Daniel Carrera wrote:
> > Wesley Parish wrote:
> >> I suspect Microsoft dragged over some of their programming gurus from
> >> arcane C/C++-using projects to draft this standard, because it's got
>
> 
> "Arcane"? Uh, you mean like OpenOffice.org's codebase? Or all of Linux?
> Or Firefox?

I'm referring to their (in)famous Hungarian notation - if that's the correct 
word; it's been a while since I've read those magazines.  ;)

(Speaking about codebases, I'm going to try reading konqueror and koffice - 
while trying to sort out a heap of old Unix and DOS Public Domain source code 
to make something useful from it ... it's there, it's miniscule in terms of 
memory usage, and I being a bear of very small brain, think that small is 
beautiful  <;)

Wesley Parish
>
>   mark "yes, I *am* a programmer"

-- 
Clinersterton beademung, with all of love - RIP James Blish
-
Mau e ki, he aha te mea nui?
You ask, what is the most important thing?
Maku e ki, he tangata, he tangata, he tangata.
I reply, it is people, it is people, it is people.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Alexandro Colorado
On Mon, 28 Nov 2005 00:04:30 -, Henrik Sundberg <[EMAIL PROTECTED]>  
wrote:



2005/11/27, Daniel Carrera <[EMAIL PROTECTED]>:

Irrelevant comparison. Document files are not programs. OOo is a 60 MB
program, not a 192kb document. OOo does rendering, memmory allocation,
loads external libraries, runs threads, and does a zillion other things
that documents don't do.


I was thinking of the files from the "memory hog" discussions found at
http://blogs.zdnet.com/Ou/?p=101

The unzipped XML (SXC) is 286 MB. Almost 5 times larger than OOo. The
MS XML equivalent was 193 MB. It is my firm belief that the parsing
time of the difference (93 MB) is noticeable.
The SXC file was only 3.6 MB, but the uncompressed size has to be
traversed in memory at least.
/Henrik



Sorry I havent really follow this topic but I just think to throw it out.

Federico Mena Quintero is a Developer from Ximian, sho has done a lot of  
test about performance on GNOME.


http://primates.ximian.com/~federico/news-2005-10.html#oocalc-performance

Really interesting information when he went through sysprof.


--
Alexandro Colorado
CoLeader of OpenOffice.org ES
http://es.openoffice.org

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Daniel Carrera

Henrik Sundberg wrote:

I was thinking of the files from the "memory hog" discussions found at
http://blogs.zdnet.com/Ou/?p=101

The unzipped XML (SXC) is 286 MB.


Well... alright, if you have a file that large, the file size makes 
quite a difference. I was talking about typical files. For something 
that large I would question the wisdom of using XML at all. A database 
seems like the right tool. Obviously XML isn't the right tool for every 
job. You wouldn't want to store images or music on XML.



Almost 5 times larger than OOo. The
MS XML equivalent was 193 MB. It is my firm belief that the parsing
time of the difference (93 MB) is noticeable.


The time required to parse 93MB is negligible compared to the time 
required to *swap* 93MB. Memory and CPUs are several orders of magnitude 
faster than disc access.


Cheers,
Daniel.
--
 /\/`) http://oooauthors.org
/\/_/  http://opendocumentfellowship.org
   /\/_/  No trees were harmed in the creation of this email.
   \/_/   However, a significant number of electrons were
   /  were severely inconvenienced.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Henrik Sundberg
2005/11/27, Daniel Carrera <[EMAIL PROTECTED]>:
> Irrelevant comparison. Document files are not programs. OOo is a 60 MB
> program, not a 192kb document. OOo does rendering, memmory allocation,
> loads external libraries, runs threads, and does a zillion other things
> that documents don't do.

I was thinking of the files from the "memory hog" discussions found at
http://blogs.zdnet.com/Ou/?p=101

The unzipped XML (SXC) is 286 MB. Almost 5 times larger than OOo. The
MS XML equivalent was 193 MB. It is my firm belief that the parsing
time of the difference (93 MB) is noticeable.
The SXC file was only 3.6 MB, but the uncompressed size has to be
traversed in memory at least.
/Henrik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Daniel Carrera

Henrik Sundberg wrote:

I'd say that smaller files are faster than bigger.


The slow down due to the size increase is infinitesimal. See below for 
an example. It's like arguing that you should use small variables in 
your python program because that will make the file faster. Anyone who 
knows how to program knows that that's a stupid idea.



Memory is slow.


No, memmory is fast. Transfer rates of 1,000-2,000 MB/sec means that for 
a 50-page document (details below) you can expect to save at most 
0.00014 seconds by using smaller tags.



Disks are slow.


The transfer rate of an IDE disk is in the order of 100MBits/second. The 
INGOTs handbook is a 50-page document with lots of tables. It is 192Kb. 
So the disk access part of the process contributes 0.015 seconds to the 
loading speed. I just wrote a perl program to remove all the paragraph 
and table tags (this is unreasonable of course, since you still have to 
have some tag). The result was 48kb. This means that, for this document, 
using small tags would save you *less* than 0.011 seconds in loading 
time. And in exchange for that you would get a more buggy program.



Hashing long strings is slower
than hashing short ones (for symbol table look up).


No, symbol look up for a longer symbol is *not* slower.


Parsing shorter
files takes less time than parsing longer ones.


False. Using a  is not slower than .


It takes longer time to start large programs as well.


Irrelevant comparison. Document files are not programs. OOo is a 60 MB 
program, not a 192kb document. OOo does rendering, memmory allocation, 
loads external libraries, runs threads, and does a zillion other things 
that documents don't do.


Cheers,
Daniel.
--
 /\/`) http://oooauthors.org
/\/_/  http://opendocumentfellowship.org
   /\/_/  No trees were harmed in the creation of this email.
   \/_/   However, a significant number of electrons were
   /  were severely inconvenienced.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Henrik Sundberg
2005/11/26, Daniel Carrera <[EMAIL PROTECTED]>:
> Of course it "can" be abreviated. What I'm saying is that abreviating it
> is not going to give you the benefit that you think it will. It will not
> speed up parsin, it will not make the file load faster. It will save
> disk space, but I doubt that disk space is the primary concern for most
> people who have documents.

I'd say that smaller files are faster than bigger. Off course they
are. Memory is slow. Disks are slow. Hashing long strings is slower
than hashing short ones (for symbol table look up). Parsing shorter
files takes less time than parsing longer ones.
It takes longer time to start large programs as well.

The effect of this ought to be fairly easy to check with any XML
parser. Create a large (so the parsing time is easy to measure) XML
file with a few different short tags and time the parser. Replace all
tags with long names and check the parsing time again. Repeat the
tests a few times to get more reliable values.
/Henrik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread mark

Daniel Carrera wrote:

Wesley Parish wrote:

I suspect Microsoft dragged over some of their programming gurus from 
arcane C/C++-using projects to draft this standard, because it's got 


"Arcane"? Uh, you mean like OpenOffice.org's codebase? Or all of Linux? 
Or Firefox?


mark "yes, I *am* a programmer"

--
FDR: We have nothing to fear but fear itself.
GWB: Be afwaid. Be vewwy afwaid.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Daniel Carrera

Randomthots wrote:
That's arguable. Comparing the time it takes to zip the archive with 
7-zip vs. the time it takes OOo to save the file, I would estimate that 
the compression step takes up maybe 20% of the total time at most.


20% would have been my guess. I never thought that the zipping step was 
dominant. I would guess that the other 80% is mostly due to a 
combination of (1) XML parsing and (2) OOo just being slow.


Incidentally, I made a PmWiki plugin to export wiki pages to 
OpenDocumnet. It much faster than OOo for comparable documents.



Tell me please, Daniel, what extra information is contained in the xml 
snippet:
office:value-type="string">arin


that isn't contained in: ,"arin",


I think this is a bad example. You purposely picked a cell that had as 
little information as possible. But even in your example, you can see 
that there is a paragraph (and not two or three), that the element is a 
cell belonging to a table (as opposed to a header, or a drawing), and 
that the cell has a type. There is also style information in 
content.xml, styles.xml, settings.xml and meta.xml that include the cell 
properties (border, size, width, font, author, date, revisions - if any, 
etc.)


Of course, if this additional information is not interesting to you. And 
you are not interested in being able to have additional information 
beyond what can be contained in a CSV file, then you are better off 
using CSV files. But it is unfair to blame OpenDocument for not being as 
fast or as small as another format that is more specialized and no where 
near as "powerful".


It's like when people send you a word attachment with just text. You can 
complain that the word file is unnecessary, but you are not surprised 
that it's bigger than the plain text version. And you don't claim that 
.doc files should be as small as .txt files for the same content.


But all this is still a strawman you built because what started this 
thread was your claim that small tags would make OOo faster. They won't.


You can't just compare CSV vs OpenDocument and conclude that the 
problem is the size of the XML tags. That's plain silly.


In this particular case, it's not silly at all.


It is because there are many other things that could be causing the 
problems you experience and you just picked one at random.



I realize this is not a normal case.


Indeed, it is not. It is also not related to your original claim, that a 
smaller tag was better. It's like saying that your Python programs would 
run faster if you use smaller variables. It's a silly "optimization". 
Instead you should look at how the program is designed. Every programmer 
knows that those silly optimizations do more harm than good.


It's more like a controlled 
experiment where you remove as many variables as you can in order to 
study the particular phenomenon of interest.


No, it's more like a strawman when you claim that smaller tags would 
make OOo faster and use CSV to "prove" your claim.


Cheers,
Daniel.
--
 /\/`) http://oooauthors.org
/\/_/  http://opendocumentfellowship.org
   /\/_/  No trees were harmed in the creation of this email.
   \/_/   However, a significant number of electrons were
   /  were severely inconvenienced.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Daniel Carrera

Wesley Parish wrote:
I suspect Microsoft dragged over some of their programming gurus from arcane 
C/C++-using projects to draft this standard, because it's got the feeling of 
the Microsoft Standard variable-naming procedures that I've seen discussed in 
various programming magazines here and there.


A lot of people suspect that they just made an XML dump of their DOM 
objects. That would be a very lazy way to make an XML format. Of course 
it misses the whole point of XML, but why should they care?


Cheers,
Daniel.
--
 /\/`) http://oooauthors.org
/\/_/  http://opendocumentfellowship.org
   /\/_/  No trees were harmed in the creation of this email.
   \/_/   However, a significant number of electrons were
   /  were severely inconvenienced.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-27 Thread Wesley Parish
On Sat, 26 Nov 2005 23:22, Daniel Carrera wrote:
> Randomthots wrote:
> > 1. Does Microsoft's XML standard now encompass all document types? Last
> > I knew they only had an XML format for Word.
>
> Microsoft's FAQ says:
>

> > I notice that in the examples cited in the article that MS tends to use
> > very short tags like , whereas the OD tags are full words like
> > . I realize this aids in human
> > readability but most of the time... who cares? I'm not going to be
> > reading the raw file anyway.
>
> Please read the top of the article. It explains why you should care
> about which format is understandable. Because the developer who is
> writing the application you want to use needs to understand it and know
> how to use it. And the more understandable the format is, the better the
> support, and the better the compatibility.
>
> Understandability/simplicity/etc has a DIRECT effect on things you do
> care about like how many applictions support it, and whether you can
> reasonably expect a file produced by one to be read by another (ie.
> interoperability).
>
> And interoperability is the whole point of using XML. If you don't care
> about a developer understanding the format, you might as well be using
> Microsofot's .doc.
>
> Using obscure tags like  is gratuitous obscurity. It makes it
> harder for competitors to understand the format and support it for no
> benefit.
>
> Daniel.

In particular, anyone who's only ever used HTML before, would find 
himself/herself comfortable with ODF very quickly.

I suspect Microsoft dragged over some of their programming gurus from arcane 
C/C++-using projects to draft this standard, because it's got the feeling of 
the Microsoft Standard variable-naming procedures that I've seen discussed in 
various programming magazines here and there.

Be that as it may, it's not the way the various Markup Languages have been 
designed and taught with a focus on simplicity and clarity of expression.

It's their problem, not ours.

Wesley Parish
-- 
Clinersterton beademung, with all of love - RIP James Blish
-
Mau e ki, he aha te mea nui?
You ask, what is the most important thing?
Maku e ki, he tangata, he tangata, he tangata.
I reply, it is people, it is people, it is people.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[discuss] Re: Article: OpenDocument vs MS XML

2005-11-26 Thread Randomthots

Daniel Carrera wrote:


Randomthots wrote:

The number of characters has no effect on speed. There is no reason 
why  is faster to parse than .



I'm sorry, Daniel, but I find that hard to believe.

I have a file that is strictly text, numbers, and dates. Seven columns 
by 63,260 rows -- no formulas, no formatting. Importing as csv takes a 
few seconds. Converted to ods it takes *much* longer to load -- around 
30 seconds or so.



What makes you think that the reason for the slowdown is because 
OpenDocument uses verbose tags instead of hard to understand tags? The 
size of the tag has essentially *zero* effect on speed.


For one particular tag, or for a normally sized spreadsheet, I'm sure 
you're right. But even a little bit has to add up. In that particular 
file the tag sequence I posted is essentially repeated 63,260 x 7 times. 
That's 442,820 times.


The slow down is 
because of the additional steps in compression,


That's arguable. Comparing the time it takes to zip the archive with 
7-zip vs. the time it takes OOo to save the file, I would estimate that 
the compression step takes up maybe 20% of the total time at most.


XML parsing, and the 
fact that OpenDocument files contain more information than CSV files.


Tell me please, Daniel, what extra information is contained in the xml 
snippet:
office:value-type="string">arin


that isn't contained in: ,"arin",




You can't just compare CSV vs OpenDocument and conclude that the problem 
is the size of the XML tags. That's plain silly.


In this particular case, it's not silly at all. If I do some simple 
substitutions and some liberal deleting, I can fairly easily reproduce 
the csv from the ods. And I won't lose a scrap of information in the 
process.


I realize this is not a normal case. It's more like a controlled 
experiment where you remove as many variables as you can in order to 
study the particular phenomenon of interest.


The only conclusion I can make is that XML makes a terrible format for 
databases that look like spreadsheets (or spreadsheets that look like 
databases). Maybe this will spur people to learn how to use Base.


--

Rod



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-26 Thread Daniel Carrera

Randomthots wrote:

The number of characters has no effect on speed. There is no reason 
why  is faster to parse than .


I'm sorry, Daniel, but I find that hard to believe.

I have a file that is strictly text, numbers, and dates. Seven columns 
by 63,260 rows -- no formulas, no formatting. Importing as csv takes a 
few seconds. Converted to ods it takes *much* longer to load -- around 
30 seconds or so.


What makes you think that the reason for the slowdown is because 
OpenDocument uses verbose tags instead of hard to understand tags? The 
size of the tag has essentially *zero* effect on speed. The slow down is 
because of the additional steps in compression, XML parsing, and the 
fact that OpenDocument files contain more information than CSV files.


You can't just compare CSV vs OpenDocument and conclude that the problem 
is the size of the XML tags. That's plain silly.


I just don't understand why it takes over 80 characters to describe a 4 
character text value in a cell with no formatting:


* It's XML.
* Long, descriptive names help ensure correctness.

You're not going to convince me that couldn't be usefully abbreviated in 
some way and that all that doesn't take cycles to process.


Of course it "can" be abreviated. What I'm saying is that abreviating it 
is not going to give you the benefit that you think it will. It will not 
speed up parsin, it will not make the file load faster. It will save 
disk space, but I doubt that disk space is the primary concern for most 
people who have documents.


I "get it" about ODF, Daniel, I really, really, do. I'm a supporter. But 
that doesn't mean we can just pretend that disadvantages don't exist. 


Every decision has disadvantages. But the ones you pointed out are 
ficticious. Instead you could complain that a larger file affects 
bandwidth and XML parsing slows things down. At least those are real 
disadvantages. But saying that the size of the XML tag makes the file 
slow to load is not terribly valid.


Cheers,
Daniel.
--
 /\/`) http://oooauthors.org
/\/_/  http://opendocumentfellowship.org
   /\/_/  No trees were harmed in the creation of this email.
   \/_/   However, a significant number of electrons were
   /  were severely inconvenienced.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[discuss] Re: Article: OpenDocument vs MS XML

2005-11-26 Thread Randomthots

Daniel Carrera wrote:



I haven't yet seen any examples of the new Excel format. But verbosity 
isn't really an issue.


snip <

The number of characters has no effect on speed. There is no reason why 
 is faster to parse than .


I'm sorry, Daniel, but I find that hard to believe.

I have a file that is strictly text, numbers, and dates. Seven columns 
by 63,260 rows -- no formulas, no formatting. Importing as csv takes a 
few seconds. Converted to ods it takes *much* longer to load -- around 
30 seconds or so.


The original csv is 3.945 MB. The content.xml of this file is 44.305 MB. 
A ratio of over 11 to 1. I can't say that it takes eleven times as long 
to load -- I haven't timed it that close -- but it's in the ballpark. 
Keep in mind that at the end of the day, the program has to end up with 
exactly the same data structures and it starts out with basically the 
same information.


I just don't understand why it takes over 80 characters to describe a 4 
character text value in a cell with no formatting:


office:value-type="string">arin


You're not going to convince me that couldn't be usefully abbreviated in 
some way and that all that doesn't take cycles to process.


I "get it" about ODF, Daniel, I really, really, do. I'm a supporter. But 
that doesn't mean we can just pretend that disadvantages don't exist. 
The worst part is that the performance hit is something that the user 
will experience every day, while the advantages may not be so readily 
apparent -- or even applicable at all, depending on how you use the suite.


--

Rod


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [discuss] Re: Article: OpenDocument vs MS XML

2005-11-26 Thread Daniel Carrera

Randomthots wrote:

1. Does Microsoft's XML standard now encompass all document types? Last 
I knew they only had an XML format for Word.


Microsoft's FAQ says:

"Currently, only Microsoft Office Word, Microsoft Office Excel, and 
Microsoft Office PowerPoint will use Office XML Formats"


In particular, it doesn't cover InfoPath, Visio, Publisher, etc.


2. If the answer to 1 is "yes", then how does their format for 
spreadsheets compare to OD for verbosity?


I haven't yet seen any examples of the new Excel format. But verbosity 
isn't really an issue.


I probably don't understand this all well enough, but the sheer size of 
OD spreadsheet files (before compression) bothers me. It seems like 
there is an incredible number of characters required to describe each 
cell, which can't help the processing speed any.


The number of characters has no effect on speed. There is no reason why 
 is faster to parse than .


To someone who actually works in XML, the verbosity of OpenDocument is 
welcome because it makes the file format a lot more transparent.


I notice that in the examples cited in the article that MS tends to use 
very short tags like , whereas the OD tags are full words like 
. I realize this aids in human 
readability but most of the time... who cares? I'm not going to be 
reading the raw file anyway.


Please read the top of the article. It explains why you should care 
about which format is understandable. Because the developer who is 
writing the application you want to use needs to understand it and know 
how to use it. And the more understandable the format is, the better the 
support, and the better the compatibility.


Understandability/simplicity/etc has a DIRECT effect on things you do 
care about like how many applictions support it, and whether you can 
reasonably expect a file produced by one to be read by another (ie. 
interoperability).


And interoperability is the whole point of using XML. If you don't care 
about a developer understanding the format, you might as well be using 
Microsofot's .doc.


Using obscure tags like  is gratuitous obscurity. It makes it 
harder for competitors to understand the format and support it for no 
benefit.


Daniel.
--
 /\/`) http://oooauthors.org
/\/_/  http://opendocumentfellowship.org
   /\/_/  No trees were harmed in the creation of this email.
   \/_/   However, a significant number of electrons were
   /  were severely inconvenienced.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[discuss] Re: Article: OpenDocument vs MS XML

2005-11-25 Thread Randomthots

Daniel Carrera wrote:

Hi all,

Excellent article at Groklaw:

http://www.groklaw.net/article.php?story=20051125144611543

It's a technical comparison between OpenDocument and Microsoft's XML 
format. It's intended to be suitable for a semi-technical audience (ie. 
people who know a bit of HTML) and the focus is on interoperability.


OpenDocument beats MS XML in interoperability hands down. And this 
article explains some of the technical reasons why. I highly recommend it.


Cheers,
Daniel.


Hi Daniel,
   Thanks for the link. Bear with me as I try to formulate my questions...

1. Does Microsoft's XML standard now encompass all document types? Last 
I knew they only had an XML format for Word.


2. If the answer to 1 is "yes", then how does their format for 
spreadsheets compare to OD for verbosity?


I probably don't understand this all well enough, but the sheer size of 
OD spreadsheet files (before compression) bothers me. It seems like 
there is an incredible number of characters required to describe each 
cell, which can't help the processing speed any.


I notice that in the examples cited in the article that MS tends to use 
very short tags like , whereas the OD tags are full words like 
. I realize this aids in human 
readability but most of the time... who cares? I'm not going to be 
reading the raw file anyway.


Anyway, good article.

--

Cheers,

Rod


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]