Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-23 Thread Lubos Lunak
On Thursday 22 of September 2011, Kohei Yoshida wrote:
 On Thu, 2011-09-22 at 17:09 +0100, Caolán McNamara wrote:
  Seems to spend all its time getting and setting optimal row
  heights/widths or something like that in ods/xlsx mode.

 Yes.  That's an infamous bottleneck.  Calc's row height adjustment on
 file load is pretty darn expensive.  It basically re-calculates the
 heights of rows in all rows regardless.
...
 Not sure if ods stores row heights, but if it does, then we can do the
 same thing for ods too.  That's worth a check.

 Even if ods does not officially store it, presumably you can have an 
LO-specific field where LO could cache the value for faster reading anyway, 
no?

-- 
 Lubos Lunak
 l.lu...@suse.cz
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-22 Thread Caolán McNamara
On Wed, 2011-09-21 at 11:28 +0100, Michael Meeks wrote:
 It'd be really interesting to get a callgrind trace of ods / xlsx loading.

oh, as an aside, this is easy to get with
cd sc
export VALGRIND=callgrind 
make -sr

Here's mine FWIW http://www.skynet.ie/~caolan/pub/tmp/callgrind.out.4171

Seems to spend all its time getting and setting optimal row
heights/widths or something like that in ods/xlsx mode.

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-22 Thread Michael Stahl
On 21.09.2011 13:08, Lubos Lunak wrote:
 On Wednesday 21 of September 2011, Michael Meeks wrote:
 On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote:
 In an ideal world I imagine the best spent effort would be on improving
 the import speed for .ods and .xlsx, seeing as that improves the
 real-world case too.

  Agreed. Assuming that the files are of equivalent on-disk size, it's
 amusing that the old code is fastest and the newest slowest. It'd be
 really interesting to get a callgrind trace of ods / xlsx loading.

well nobody ever claimed that XML is faster to parse than binary gunk; it
is, however, far easier to import in C/C++ without accidentally
compromising the user's system in the process  :-/

  Amusing maybe, but not strange at all. XLS is binary format, ODS and XLSX 
 are 
 XML-based. Any guesses on how much slower image loading would be if somebody 
 came up with JPEGX? And, if calc import filters are written in at least 
 somewhat similar way to writer import filters, then XML-based filters get 
 additional penalty for XSL processing and abstractions, especially in a 
 non-optimized build.

AFAIK importing an ODF or OOXML file won't use any XSLT stuff; that is
only for less popular formats.


-- 
This article, then, is a serious analysis of a ridiculous subject,
 which is of course the opposite of what is usual in economics.
 -- Paul Krugman, The Theory Of Interstellar Trade,
Economic Inquiry, Vol. 48(4), p. 1119-1123

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-22 Thread Kohei Yoshida
On Thu, 2011-09-22 at 17:09 +0100, Caolán McNamara wrote:
 On Wed, 2011-09-21 at 11:28 +0100, Michael Meeks wrote:
  It'd be really interesting to get a callgrind trace of ods / xlsx loading.
 
 oh, as an aside, this is easy to get with
 cd sc
 export VALGRIND=callgrind 
 make -sr
 
 Here's mine FWIW http://www.skynet.ie/~caolan/pub/tmp/callgrind.out.4171
 
 Seems to spend all its time getting and setting optimal row
 heights/widths or something like that in ods/xlsx mode.

Yes.  That's an infamous bottleneck.  Calc's row height adjustment on
file load is pretty darn expensive.  It basically re-calculates the
heights of rows in all rows regardless.

The xls import used to suffer the same thing, but we've replaced that
with using the row heights stored in the xls directly, which
coincidentally improved the layout preservation with Excel documents as
well (and loads much quicker).

In theory, we could skip that for xlsx too, by doing the same thing we
did for xls since xlsx too store the row height with the document.

Not sure if ods stores row heights, but if it does, then we can do the
same thing for ods too.  That's worth a check.

As an aside, dbf and csv import had the same issue, but I've reduced the
need to re-calc row height to only those rows that really need
re-calculation.  Needless to say that resulted in a much faster import
especially for large dbf and csv documents.  Much faster.

Kohei

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-21 Thread Caolán McNamara
On Tue, 2011-09-20 at 11:27 +0200, Markus Mohrhard wrote:
 The sc unit tests are slow because we use the filter-tests as an way
 to test not only the import but also the first recalculation. We
 havenow already about 10 files that we import and check and I have
 some additional that are not yet finished.

So, here's my numbers from sc. --enable-dbgutil build, linux, x86_64
I did the make non-parallel, in -jX mode the two sc unit tests would be
run in parallel.

a) all already built, make in sc, i.e. check depends and do nothing

time make -sr build
real0m4.115s

So there's a 4 second make setup cost there already for me.

b) rebuild one file, i.e. ccache + link

touch source/core/data/attarray.cxx
time make -sr build
real0m43.462s

Ouch, but as expected, linking isn't very fast.

c) all built, make sc, i.e. check depends and run tests

time make -sr
real0m11.275s

So there's apparently about 7 seconds spent in the tests, that's quite
long alright, much less than a link, but fairly long.

So I've now added a timer listener to our cppunit launcher to tell us
how long each test takes[1] and indeed it takes about 70ms to load a
.xls, 370ms for an equivalent .ods and about 500ms(?!) for an
equivalent .xlsx

So the problem isn't that loading files is inherently slow but that
loading .ods and .xlsx files is slow [2]

In an ideal world I imagine the best spent effort would be on improving
the import speed for .ods and .xlsx, seeing as that improves the
real-world case too.

As a quick hack it's possibly a fixed overhead cost per filter, so
bundling the various .ods and .xlsx tests together into one file per
format might knock a lot off the unit test time. It may then be the case
that additional content added to a single .ods/.xlsx is a tiny cost
relative to a fixed cost of loading one in the first place.

Or stick to xls for tests, that filter is faster :-)

[1] you need to define TIMETESTS in cppunittester/cppunittester.cxx to
get the timing information, maybe this is worth hooking off some command
line option or env variable ?
[2] in --enable-dbgutil mode anyway

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-21 Thread Michael Meeks

On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote:
 So I've now added a timer listener to our cppunit launcher to tell us
 how long each test takes[1] and indeed it takes about 70ms to load a
 .xls, 370ms for an equivalent .ods and about 500ms(?!) for an
 equivalent .xlsx

Ooh :-) pretty numbers indeed, how can I reproduce them ? I notice they
didn't go to the console (which is a shame), and couldn't see them in
the sc_filters_test.log file either (oddly).

 In an ideal world I imagine the best spent effort would be on improving
 the import speed for .ods and .xlsx, seeing as that improves the
 real-world case too.

Agreed. Assuming that the files are of equivalent on-disk size, it's
amusing that the old code is fastest and the newest slowest. It'd be
really interesting to get a callgrind trace of ods / xlsx loading.

Hey ho,

Michael.

-- 
 michael.me...@novell.com  , Pseudo Engineer, itinerant idiot


___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-21 Thread Lubos Lunak
On Wednesday 21 of September 2011, Michael Meeks wrote:
 On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote:
  In an ideal world I imagine the best spent effort would be on improving
  the import speed for .ods and .xlsx, seeing as that improves the
  real-world case too.

   Agreed. Assuming that the files are of equivalent on-disk size, it's
 amusing that the old code is fastest and the newest slowest. It'd be
 really interesting to get a callgrind trace of ods / xlsx loading.

 Amusing maybe, but not strange at all. XLS is binary format, ODS and XLSX are 
XML-based. Any guesses on how much slower image loading would be if somebody 
came up with JPEGX? And, if calc import filters are written in at least 
somewhat similar way to writer import filters, then XML-based filters get 
additional penalty for XSL processing and abstractions, especially in a 
non-optimized build.

-- 
 Lubos Lunak
 l.lu...@suse.cz
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Proposal: slowcheck -- some numbers for sc

2011-09-21 Thread Caolán McNamara
On Wed, 2011-09-21 at 11:28 +0100, Michael Meeks wrote:
 On Wed, 2011-09-21 at 09:50 +0100, Caolán McNamara wrote:
  So I've now added a timer listener to our cppunit launcher to tell us
  how long each test takes[1] and indeed it takes about 70ms to load a
  .xls, 370ms for an equivalent .ods and about 500ms(?!) for an
  equivalent .xlsx
 
   Ooh :-) pretty numbers indeed, how can I reproduce them ? I notice they
 didn't go to the console (which is a shame), and couldn't see them in
 the sc_filters_test.log file either (oddly).

cd sal
edit cppunittester/cppunittester.cxx, define TIMETESTS
build
(should see the times for the sal cppunit tests on stdout because those
ones are not gbuild-ified)
deliver -link; cd sc; make -sr;
grep ms ../workdir/unxlngx6/CppunitTest/sc_filters_test.test.log
FiltersTest::testRangeName: 525ms
FiltersTest::testContent: 1146ms
FiltersTest::testFunctions: 411ms
FiltersTest::testDatabaseRanges: 424ms
FiltersTest::testFormats: 1039ms
FiltersTest::testBugFixesODS: 408ms
FiltersTest::testBugFixesXLS: 61ms
FiltersTest::testBugFixesXLSX: 350ms

for my claim about the times per format I edited
FiltersTest::testFormats and changed the loop three times to just get
three times for for FiltersTest::testFormats as if it did just one
format. Those tally fairly close to the individual format
FiltersTest::testBugFixesXX times shown above

the times are spat out to stdout, though in the gbuildified modules
output is only shown if a test fails, at the gbuild layer.

  In an ideal world I imagine the best spent effort would be on improving
  the import speed for .ods and .xlsx, seeing as that improves the
  real-world case too.
 
   Agreed. Assuming that the files are of equivalent on-disk size.

If someone has the time, sticking basically empty .ods/.xlsx file in
there would be also worth following up. i.e. how long does loading an
empty one take :-)

It all might be a bit easier to hack some of the performance things at
the stripped-down unittest loading level. And my times are with
--enable-dbgutil, so may be totally irrelevant for real-world .ods/xlsx
loading, even if obviously relevant for dbgutil using hacking.

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice