Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On Thursday 22 of September 2011, Kohei Yoshida wrote: > On Thu, 2011-09-22 at 17:09 +0100, Caolán McNamara wrote: > > Seems to spend all its time getting and setting optimal row > > heights/widths or something like that in ods/xlsx mode. > > Yes. That's an infamous bottleneck. Calc's row height adjustment on > file load is pretty darn expensive. It basically re-calculates the > heights of rows in all rows regardless. ... > Not sure if ods stores row heights, but if it does, then we can do the > same thing for ods too. That's worth a check. Even if ods does not officially store it, presumably you can have an LO-specific field where LO could cache the value for faster reading anyway, no? -- Lubos Lunak l.lu...@suse.cz ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On Thu, 2011-09-22 at 17:09 +0100, Caolán McNamara wrote: > On Wed, 2011-09-21 at 11:28 +0100, Michael Meeks wrote: > > It'd be really interesting to get a callgrind trace of ods / xlsx loading. > > oh, as an aside, this is easy to get with > cd sc > export VALGRIND=callgrind > make -sr > > Here's mine FWIW http://www.skynet.ie/~caolan/pub/tmp/callgrind.out.4171 > > Seems to spend all its time getting and setting optimal row > heights/widths or something like that in ods/xlsx mode. Yes. That's an infamous bottleneck. Calc's row height adjustment on file load is pretty darn expensive. It basically re-calculates the heights of rows in all rows regardless. The xls import used to suffer the same thing, but we've replaced that with using the row heights stored in the xls directly, which coincidentally improved the layout preservation with Excel documents as well (and loads much quicker). In theory, we could skip that for xlsx too, by doing the same thing we did for xls since xlsx too store the row height with the document. Not sure if ods stores row heights, but if it does, then we can do the same thing for ods too. That's worth a check. As an aside, dbf and csv import had the same issue, but I've reduced the need to re-calc row height to only those rows that really need re-calculation. Needless to say that resulted in a much faster import especially for large dbf and csv documents. Much faster. Kohei ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On 21.09.2011 13:08, Lubos Lunak wrote: > On Wednesday 21 of September 2011, Michael Meeks wrote: >> On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote: >>> In an ideal world I imagine the best spent effort would be on improving >>> the import speed for .ods and .xlsx, seeing as that improves the >>> real-world case too. >> >> Agreed. Assuming that the files are of equivalent on-disk size, it's >> amusing that the old code is fastest and the newest slowest. It'd be >> really interesting to get a callgrind trace of ods / xlsx loading. well nobody ever claimed that XML is faster to parse than binary gunk; it is, however, far easier to import in C/C++ without accidentally compromising the user's system in the process :-/ > Amusing maybe, but not strange at all. XLS is binary format, ODS and XLSX > are > XML-based. Any guesses on how much slower image loading would be if somebody > came up with JPEGX? And, if calc import filters are written in at least > somewhat similar way to writer import filters, then XML-based filters get > additional penalty for XSL processing and abstractions, especially in a > non-optimized build. AFAIK importing an ODF or OOXML file won't use any XSLT stuff; that is only for less popular formats. -- "This article, then, is a serious analysis of a ridiculous subject, which is of course the opposite of what is usual in economics." -- Paul Krugman, "The Theory Of Interstellar Trade", Economic Inquiry, Vol. 48(4), p. 1119-1123 ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On Wed, 2011-09-21 at 11:28 +0100, Michael Meeks wrote: > It'd be really interesting to get a callgrind trace of ods / xlsx loading. oh, as an aside, this is easy to get with cd sc export VALGRIND=callgrind make -sr Here's mine FWIW http://www.skynet.ie/~caolan/pub/tmp/callgrind.out.4171 Seems to spend all its time getting and setting optimal row heights/widths or something like that in ods/xlsx mode. C. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On Wed, 2011-09-21 at 11:28 +0100, Michael Meeks wrote: > On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote: > > So I've now added a timer listener to our cppunit launcher to tell us > > how long each test takes[1] and indeed it takes about 70ms to load a > > .xls, 370ms for an equivalent .ods and about 500ms(?!) for an > > equivalent .xlsx > > Ooh :-) pretty numbers indeed, how can I reproduce them ? I notice they > didn't go to the console (which is a shame), and couldn't see them in > the sc_filters_test.log file either (oddly). cd sal edit cppunittester/cppunittester.cxx, define TIMETESTS build (should see the times for the sal cppunit tests on stdout because those ones are not gbuild-ified) deliver -link; cd sc; make -sr; grep ms ../workdir/unxlngx6/CppunitTest/sc_filters_test.test.log FiltersTest::testRangeName: 525ms FiltersTest::testContent: 1146ms FiltersTest::testFunctions: 411ms FiltersTest::testDatabaseRanges: 424ms FiltersTest::testFormats: 1039ms FiltersTest::testBugFixesODS: 408ms FiltersTest::testBugFixesXLS: 61ms FiltersTest::testBugFixesXLSX: 350ms for my claim about the times per format I edited FiltersTest::testFormats and changed the loop three times to just get three times for for FiltersTest::testFormats as if it did just one format. Those tally fairly close to the individual format FiltersTest::testBugFixesXX times shown above the times are spat out to stdout, though in the gbuildified modules output is only shown if a test fails, at the gbuild layer. > > In an ideal world I imagine the best spent effort would be on improving > > the import speed for .ods and .xlsx, seeing as that improves the > > real-world case too. > > Agreed. Assuming that the files are of equivalent on-disk size. If someone has the time, sticking basically empty .ods/.xlsx file in there would be also worth following up. i.e. how long does loading an empty one take :-) It all might be a bit easier to hack some of the performance things at the stripped-down unittest loading level. And my times are with --enable-dbgutil, so may be totally irrelevant for real-world .ods/xlsx loading, even if obviously relevant for dbgutil using hacking. C. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On Wednesday 21 of September 2011, Michael Meeks wrote: > On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote: > > In an ideal world I imagine the best spent effort would be on improving > > the import speed for .ods and .xlsx, seeing as that improves the > > real-world case too. > > Agreed. Assuming that the files are of equivalent on-disk size, it's > amusing that the old code is fastest and the newest slowest. It'd be > really interesting to get a callgrind trace of ods / xlsx loading. Amusing maybe, but not strange at all. XLS is binary format, ODS and XLSX are XML-based. Any guesses on how much slower image loading would be if somebody came up with JPEGX? And, if calc import filters are written in at least somewhat similar way to writer import filters, then XML-based filters get additional penalty for XSL processing and abstractions, especially in a non-optimized build. -- Lubos Lunak l.lu...@suse.cz ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
Hello Caolan, Michael, 2011/9/21 Michael Meeks > > On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote: > > So I've now added a timer listener to our cppunit launcher to tell us > > how long each test takes[1] and indeed it takes about 70ms to load a > > .xls, 370ms for an equivalent .ods and about 500ms(?!) for an > > equivalent .xlsx > > Ooh :-) pretty numbers indeed, how can I reproduce them ? I notice > they > didn't go to the console (which is a shame), and couldn't see them in > the sc_filters_test.log file either (oddly). > I'm not really surprised by these numbers - maybe that xls is that fast - but xlsx and ods use way too much slow uno calls. > > > In an ideal world I imagine the best spent effort would be on improving > > the import speed for .ods and .xlsx, seeing as that improves the > > real-world case too. > > > I don't agree here. In my opinion it would be best to have them seperate but only test universal-content.ods and maybe one or two other major features in the normal build target. All the other files can be tested in the proposed target. Mixing all features in one or two files results in a big clump that can never be extended. My idea was to have a file for every major feature in each of the three main filters. And I don't think that using only one filter is a good idea either. We had too many examples in 3-4 where changing parts of a feature resulted in misbehaviour in just one filter code. I totally agree with you that it is too much time spend in a normal build for unit tests but at least a calc dev should run all of them before pushing his changes. Regards, Markus ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote: > So I've now added a timer listener to our cppunit launcher to tell us > how long each test takes[1] and indeed it takes about 70ms to load a > .xls, 370ms for an equivalent .ods and about 500ms(?!) for an > equivalent .xlsx Ooh :-) pretty numbers indeed, how can I reproduce them ? I notice they didn't go to the console (which is a shame), and couldn't see them in the sc_filters_test.log file either (oddly). > In an ideal world I imagine the best spent effort would be on improving > the import speed for .ods and .xlsx, seeing as that improves the > real-world case too. Agreed. Assuming that the files are of equivalent on-disk size, it's amusing that the old code is fastest and the newest slowest. It'd be really interesting to get a callgrind trace of ods / xlsx loading. Hey ho, Michael. -- michael.me...@novell.com <><, Pseudo Engineer, itinerant idiot ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc
On Tue, 2011-09-20 at 11:27 +0200, Markus Mohrhard wrote: > The sc unit tests are slow because we use the filter-tests as an way > to test not only the import but also the first recalculation. We > havenow already about 10 files that we import and check and I have > some additional that are not yet finished. So, here's my numbers from sc. --enable-dbgutil build, linux, x86_64 I did the make non-parallel, in -jX mode the two sc unit tests would be run in parallel. a) all already built, make in sc, i.e. check depends and do nothing time make -sr build real0m4.115s So there's a 4 second make setup cost there already for me. b) rebuild one file, i.e. ccache + link touch source/core/data/attarray.cxx time make -sr build real0m43.462s Ouch, but as expected, linking isn't very fast. c) all built, make sc, i.e. check depends and run tests time make -sr real0m11.275s So there's apparently about 7 seconds spent in the tests, that's quite long alright, much less than a link, but fairly long. So I've now added a timer listener to our cppunit launcher to tell us how long each test takes[1] and indeed it takes about 70ms to load a .xls, 370ms for an equivalent .ods and about 500ms(?!) for an equivalent .xlsx So the problem isn't that loading files is inherently slow but that loading .ods and .xlsx files is slow [2] In an ideal world I imagine the best spent effort would be on improving the import speed for .ods and .xlsx, seeing as that improves the real-world case too. As a quick hack it's possibly a fixed overhead cost per filter, so bundling the various .ods and .xlsx tests together into one file per format might knock a lot off the unit test time. It may then be the case that additional content added to a single .ods/.xlsx is a tiny cost relative to a fixed cost of loading one in the first place. Or stick to xls for tests, that filter is faster :-) [1] you need to define TIMETESTS in cppunittester/cppunittester.cxx to get the timing information, maybe this is worth hooking off some command line option or env variable ? [2] in --enable-dbgutil mode anyway C. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice