Re: [discuss] Re: Article: OpenDocument vs MS XML
On Mon, 28 Nov 2005 03:40, mark wrote: > Daniel Carrera wrote: > > Wesley Parish wrote: > >> I suspect Microsoft dragged over some of their programming gurus from > >> arcane C/C++-using projects to draft this standard, because it's got > > > "Arcane"? Uh, you mean like OpenOffice.org's codebase? Or all of Linux? > Or Firefox? I'm referring to their (in)famous Hungarian notation - if that's the correct word; it's been a while since I've read those magazines. ;) (Speaking about codebases, I'm going to try reading konqueror and koffice - while trying to sort out a heap of old Unix and DOS Public Domain source code to make something useful from it ... it's there, it's miniscule in terms of memory usage, and I being a bear of very small brain, think that small is beautiful <;) Wesley Parish > > mark "yes, I *am* a programmer" -- Clinersterton beademung, with all of love - RIP James Blish - Mau e ki, he aha te mea nui? You ask, what is the most important thing? Maku e ki, he tangata, he tangata, he tangata. I reply, it is people, it is people, it is people. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
On Mon, 28 Nov 2005 00:04:30 -, Henrik Sundberg <[EMAIL PROTECTED]> wrote: 2005/11/27, Daniel Carrera <[EMAIL PROTECTED]>: Irrelevant comparison. Document files are not programs. OOo is a 60 MB program, not a 192kb document. OOo does rendering, memmory allocation, loads external libraries, runs threads, and does a zillion other things that documents don't do. I was thinking of the files from the "memory hog" discussions found at http://blogs.zdnet.com/Ou/?p=101 The unzipped XML (SXC) is 286 MB. Almost 5 times larger than OOo. The MS XML equivalent was 193 MB. It is my firm belief that the parsing time of the difference (93 MB) is noticeable. The SXC file was only 3.6 MB, but the uncompressed size has to be traversed in memory at least. /Henrik Sorry I havent really follow this topic but I just think to throw it out. Federico Mena Quintero is a Developer from Ximian, sho has done a lot of test about performance on GNOME. http://primates.ximian.com/~federico/news-2005-10.html#oocalc-performance Really interesting information when he went through sysprof. -- Alexandro Colorado CoLeader of OpenOffice.org ES http://es.openoffice.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
Henrik Sundberg wrote: I was thinking of the files from the "memory hog" discussions found at http://blogs.zdnet.com/Ou/?p=101 The unzipped XML (SXC) is 286 MB. Well... alright, if you have a file that large, the file size makes quite a difference. I was talking about typical files. For something that large I would question the wisdom of using XML at all. A database seems like the right tool. Obviously XML isn't the right tool for every job. You wouldn't want to store images or music on XML. Almost 5 times larger than OOo. The MS XML equivalent was 193 MB. It is my firm belief that the parsing time of the difference (93 MB) is noticeable. The time required to parse 93MB is negligible compared to the time required to *swap* 93MB. Memory and CPUs are several orders of magnitude faster than disc access. Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ No trees were harmed in the creation of this email. \/_/ However, a significant number of electrons were / were severely inconvenienced. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
2005/11/27, Daniel Carrera <[EMAIL PROTECTED]>: > Irrelevant comparison. Document files are not programs. OOo is a 60 MB > program, not a 192kb document. OOo does rendering, memmory allocation, > loads external libraries, runs threads, and does a zillion other things > that documents don't do. I was thinking of the files from the "memory hog" discussions found at http://blogs.zdnet.com/Ou/?p=101 The unzipped XML (SXC) is 286 MB. Almost 5 times larger than OOo. The MS XML equivalent was 193 MB. It is my firm belief that the parsing time of the difference (93 MB) is noticeable. The SXC file was only 3.6 MB, but the uncompressed size has to be traversed in memory at least. /Henrik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
Henrik Sundberg wrote: I'd say that smaller files are faster than bigger. The slow down due to the size increase is infinitesimal. See below for an example. It's like arguing that you should use small variables in your python program because that will make the file faster. Anyone who knows how to program knows that that's a stupid idea. Memory is slow. No, memmory is fast. Transfer rates of 1,000-2,000 MB/sec means that for a 50-page document (details below) you can expect to save at most 0.00014 seconds by using smaller tags. Disks are slow. The transfer rate of an IDE disk is in the order of 100MBits/second. The INGOTs handbook is a 50-page document with lots of tables. It is 192Kb. So the disk access part of the process contributes 0.015 seconds to the loading speed. I just wrote a perl program to remove all the paragraph and table tags (this is unreasonable of course, since you still have to have some tag). The result was 48kb. This means that, for this document, using small tags would save you *less* than 0.011 seconds in loading time. And in exchange for that you would get a more buggy program. Hashing long strings is slower than hashing short ones (for symbol table look up). No, symbol look up for a longer symbol is *not* slower. Parsing shorter files takes less time than parsing longer ones. False. Using a is not slower than . It takes longer time to start large programs as well. Irrelevant comparison. Document files are not programs. OOo is a 60 MB program, not a 192kb document. OOo does rendering, memmory allocation, loads external libraries, runs threads, and does a zillion other things that documents don't do. Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ No trees were harmed in the creation of this email. \/_/ However, a significant number of electrons were / were severely inconvenienced. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
2005/11/26, Daniel Carrera <[EMAIL PROTECTED]>: > Of course it "can" be abreviated. What I'm saying is that abreviating it > is not going to give you the benefit that you think it will. It will not > speed up parsin, it will not make the file load faster. It will save > disk space, but I doubt that disk space is the primary concern for most > people who have documents. I'd say that smaller files are faster than bigger. Off course they are. Memory is slow. Disks are slow. Hashing long strings is slower than hashing short ones (for symbol table look up). Parsing shorter files takes less time than parsing longer ones. It takes longer time to start large programs as well. The effect of this ought to be fairly easy to check with any XML parser. Create a large (so the parsing time is easy to measure) XML file with a few different short tags and time the parser. Replace all tags with long names and check the parsing time again. Repeat the tests a few times to get more reliable values. /Henrik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
Daniel Carrera wrote: Wesley Parish wrote: I suspect Microsoft dragged over some of their programming gurus from arcane C/C++-using projects to draft this standard, because it's got "Arcane"? Uh, you mean like OpenOffice.org's codebase? Or all of Linux? Or Firefox? mark "yes, I *am* a programmer" -- FDR: We have nothing to fear but fear itself. GWB: Be afwaid. Be vewwy afwaid. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
Randomthots wrote: That's arguable. Comparing the time it takes to zip the archive with 7-zip vs. the time it takes OOo to save the file, I would estimate that the compression step takes up maybe 20% of the total time at most. 20% would have been my guess. I never thought that the zipping step was dominant. I would guess that the other 80% is mostly due to a combination of (1) XML parsing and (2) OOo just being slow. Incidentally, I made a PmWiki plugin to export wiki pages to OpenDocumnet. It much faster than OOo for comparable documents. Tell me please, Daniel, what extra information is contained in the xml snippet: office:value-type="string">arin that isn't contained in: ,"arin", I think this is a bad example. You purposely picked a cell that had as little information as possible. But even in your example, you can see that there is a paragraph (and not two or three), that the element is a cell belonging to a table (as opposed to a header, or a drawing), and that the cell has a type. There is also style information in content.xml, styles.xml, settings.xml and meta.xml that include the cell properties (border, size, width, font, author, date, revisions - if any, etc.) Of course, if this additional information is not interesting to you. And you are not interested in being able to have additional information beyond what can be contained in a CSV file, then you are better off using CSV files. But it is unfair to blame OpenDocument for not being as fast or as small as another format that is more specialized and no where near as "powerful". It's like when people send you a word attachment with just text. You can complain that the word file is unnecessary, but you are not surprised that it's bigger than the plain text version. And you don't claim that .doc files should be as small as .txt files for the same content. But all this is still a strawman you built because what started this thread was your claim that small tags would make OOo faster. They won't. You can't just compare CSV vs OpenDocument and conclude that the problem is the size of the XML tags. That's plain silly. In this particular case, it's not silly at all. It is because there are many other things that could be causing the problems you experience and you just picked one at random. I realize this is not a normal case. Indeed, it is not. It is also not related to your original claim, that a smaller tag was better. It's like saying that your Python programs would run faster if you use smaller variables. It's a silly "optimization". Instead you should look at how the program is designed. Every programmer knows that those silly optimizations do more harm than good. It's more like a controlled experiment where you remove as many variables as you can in order to study the particular phenomenon of interest. No, it's more like a strawman when you claim that smaller tags would make OOo faster and use CSV to "prove" your claim. Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ No trees were harmed in the creation of this email. \/_/ However, a significant number of electrons were / were severely inconvenienced. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
Wesley Parish wrote: I suspect Microsoft dragged over some of their programming gurus from arcane C/C++-using projects to draft this standard, because it's got the feeling of the Microsoft Standard variable-naming procedures that I've seen discussed in various programming magazines here and there. A lot of people suspect that they just made an XML dump of their DOM objects. That would be a very lazy way to make an XML format. Of course it misses the whole point of XML, but why should they care? Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ No trees were harmed in the creation of this email. \/_/ However, a significant number of electrons were / were severely inconvenienced. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
On Sat, 26 Nov 2005 23:22, Daniel Carrera wrote: > Randomthots wrote: > > 1. Does Microsoft's XML standard now encompass all document types? Last > > I knew they only had an XML format for Word. > > Microsoft's FAQ says: > > > I notice that in the examples cited in the article that MS tends to use > > very short tags like , whereas the OD tags are full words like > > . I realize this aids in human > > readability but most of the time... who cares? I'm not going to be > > reading the raw file anyway. > > Please read the top of the article. It explains why you should care > about which format is understandable. Because the developer who is > writing the application you want to use needs to understand it and know > how to use it. And the more understandable the format is, the better the > support, and the better the compatibility. > > Understandability/simplicity/etc has a DIRECT effect on things you do > care about like how many applictions support it, and whether you can > reasonably expect a file produced by one to be read by another (ie. > interoperability). > > And interoperability is the whole point of using XML. If you don't care > about a developer understanding the format, you might as well be using > Microsofot's .doc. > > Using obscure tags like is gratuitous obscurity. It makes it > harder for competitors to understand the format and support it for no > benefit. > > Daniel. In particular, anyone who's only ever used HTML before, would find himself/herself comfortable with ODF very quickly. I suspect Microsoft dragged over some of their programming gurus from arcane C/C++-using projects to draft this standard, because it's got the feeling of the Microsoft Standard variable-naming procedures that I've seen discussed in various programming magazines here and there. Be that as it may, it's not the way the various Markup Languages have been designed and taught with a focus on simplicity and clarity of expression. It's their problem, not ours. Wesley Parish -- Clinersterton beademung, with all of love - RIP James Blish - Mau e ki, he aha te mea nui? You ask, what is the most important thing? Maku e ki, he tangata, he tangata, he tangata. I reply, it is people, it is people, it is people. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[discuss] Re: Article: OpenDocument vs MS XML
Daniel Carrera wrote: Randomthots wrote: The number of characters has no effect on speed. There is no reason why is faster to parse than . I'm sorry, Daniel, but I find that hard to believe. I have a file that is strictly text, numbers, and dates. Seven columns by 63,260 rows -- no formulas, no formatting. Importing as csv takes a few seconds. Converted to ods it takes *much* longer to load -- around 30 seconds or so. What makes you think that the reason for the slowdown is because OpenDocument uses verbose tags instead of hard to understand tags? The size of the tag has essentially *zero* effect on speed. For one particular tag, or for a normally sized spreadsheet, I'm sure you're right. But even a little bit has to add up. In that particular file the tag sequence I posted is essentially repeated 63,260 x 7 times. That's 442,820 times. The slow down is because of the additional steps in compression, That's arguable. Comparing the time it takes to zip the archive with 7-zip vs. the time it takes OOo to save the file, I would estimate that the compression step takes up maybe 20% of the total time at most. XML parsing, and the fact that OpenDocument files contain more information than CSV files. Tell me please, Daniel, what extra information is contained in the xml snippet: office:value-type="string">arin that isn't contained in: ,"arin", You can't just compare CSV vs OpenDocument and conclude that the problem is the size of the XML tags. That's plain silly. In this particular case, it's not silly at all. If I do some simple substitutions and some liberal deleting, I can fairly easily reproduce the csv from the ods. And I won't lose a scrap of information in the process. I realize this is not a normal case. It's more like a controlled experiment where you remove as many variables as you can in order to study the particular phenomenon of interest. The only conclusion I can make is that XML makes a terrible format for databases that look like spreadsheets (or spreadsheets that look like databases). Maybe this will spur people to learn how to use Base. -- Rod - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
Randomthots wrote: The number of characters has no effect on speed. There is no reason why is faster to parse than . I'm sorry, Daniel, but I find that hard to believe. I have a file that is strictly text, numbers, and dates. Seven columns by 63,260 rows -- no formulas, no formatting. Importing as csv takes a few seconds. Converted to ods it takes *much* longer to load -- around 30 seconds or so. What makes you think that the reason for the slowdown is because OpenDocument uses verbose tags instead of hard to understand tags? The size of the tag has essentially *zero* effect on speed. The slow down is because of the additional steps in compression, XML parsing, and the fact that OpenDocument files contain more information than CSV files. You can't just compare CSV vs OpenDocument and conclude that the problem is the size of the XML tags. That's plain silly. I just don't understand why it takes over 80 characters to describe a 4 character text value in a cell with no formatting: * It's XML. * Long, descriptive names help ensure correctness. You're not going to convince me that couldn't be usefully abbreviated in some way and that all that doesn't take cycles to process. Of course it "can" be abreviated. What I'm saying is that abreviating it is not going to give you the benefit that you think it will. It will not speed up parsin, it will not make the file load faster. It will save disk space, but I doubt that disk space is the primary concern for most people who have documents. I "get it" about ODF, Daniel, I really, really, do. I'm a supporter. But that doesn't mean we can just pretend that disadvantages don't exist. Every decision has disadvantages. But the ones you pointed out are ficticious. Instead you could complain that a larger file affects bandwidth and XML parsing slows things down. At least those are real disadvantages. But saying that the size of the XML tag makes the file slow to load is not terribly valid. Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ No trees were harmed in the creation of this email. \/_/ However, a significant number of electrons were / were severely inconvenienced. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[discuss] Re: Article: OpenDocument vs MS XML
Daniel Carrera wrote: I haven't yet seen any examples of the new Excel format. But verbosity isn't really an issue. snip < The number of characters has no effect on speed. There is no reason why is faster to parse than . I'm sorry, Daniel, but I find that hard to believe. I have a file that is strictly text, numbers, and dates. Seven columns by 63,260 rows -- no formulas, no formatting. Importing as csv takes a few seconds. Converted to ods it takes *much* longer to load -- around 30 seconds or so. The original csv is 3.945 MB. The content.xml of this file is 44.305 MB. A ratio of over 11 to 1. I can't say that it takes eleven times as long to load -- I haven't timed it that close -- but it's in the ballpark. Keep in mind that at the end of the day, the program has to end up with exactly the same data structures and it starts out with basically the same information. I just don't understand why it takes over 80 characters to describe a 4 character text value in a cell with no formatting: office:value-type="string">arin You're not going to convince me that couldn't be usefully abbreviated in some way and that all that doesn't take cycles to process. I "get it" about ODF, Daniel, I really, really, do. I'm a supporter. But that doesn't mean we can just pretend that disadvantages don't exist. The worst part is that the performance hit is something that the user will experience every day, while the advantages may not be so readily apparent -- or even applicable at all, depending on how you use the suite. -- Rod - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [discuss] Re: Article: OpenDocument vs MS XML
Randomthots wrote: 1. Does Microsoft's XML standard now encompass all document types? Last I knew they only had an XML format for Word. Microsoft's FAQ says: "Currently, only Microsoft Office Word, Microsoft Office Excel, and Microsoft Office PowerPoint will use Office XML Formats" In particular, it doesn't cover InfoPath, Visio, Publisher, etc. 2. If the answer to 1 is "yes", then how does their format for spreadsheets compare to OD for verbosity? I haven't yet seen any examples of the new Excel format. But verbosity isn't really an issue. I probably don't understand this all well enough, but the sheer size of OD spreadsheet files (before compression) bothers me. It seems like there is an incredible number of characters required to describe each cell, which can't help the processing speed any. The number of characters has no effect on speed. There is no reason why is faster to parse than . To someone who actually works in XML, the verbosity of OpenDocument is welcome because it makes the file format a lot more transparent. I notice that in the examples cited in the article that MS tends to use very short tags like , whereas the OD tags are full words like . I realize this aids in human readability but most of the time... who cares? I'm not going to be reading the raw file anyway. Please read the top of the article. It explains why you should care about which format is understandable. Because the developer who is writing the application you want to use needs to understand it and know how to use it. And the more understandable the format is, the better the support, and the better the compatibility. Understandability/simplicity/etc has a DIRECT effect on things you do care about like how many applictions support it, and whether you can reasonably expect a file produced by one to be read by another (ie. interoperability). And interoperability is the whole point of using XML. If you don't care about a developer understanding the format, you might as well be using Microsofot's .doc. Using obscure tags like is gratuitous obscurity. It makes it harder for competitors to understand the format and support it for no benefit. Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ No trees were harmed in the creation of this email. \/_/ However, a significant number of electrons were / were severely inconvenienced. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[discuss] Re: Article: OpenDocument vs MS XML
Daniel Carrera wrote: Hi all, Excellent article at Groklaw: http://www.groklaw.net/article.php?story=20051125144611543 It's a technical comparison between OpenDocument and Microsoft's XML format. It's intended to be suitable for a semi-technical audience (ie. people who know a bit of HTML) and the focus is on interoperability. OpenDocument beats MS XML in interoperability hands down. And this article explains some of the technical reasons why. I highly recommend it. Cheers, Daniel. Hi Daniel, Thanks for the link. Bear with me as I try to formulate my questions... 1. Does Microsoft's XML standard now encompass all document types? Last I knew they only had an XML format for Word. 2. If the answer to 1 is "yes", then how does their format for spreadsheets compare to OD for verbosity? I probably don't understand this all well enough, but the sheer size of OD spreadsheet files (before compression) bothers me. It seems like there is an incredible number of characters required to describe each cell, which can't help the processing speed any. I notice that in the examples cited in the article that MS tends to use very short tags like , whereas the OD tags are full words like . I realize this aids in human readability but most of the time... who cares? I'm not going to be reading the raw file anyway. Anyway, good article. -- Cheers, Rod - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]