Re: [PATCH 3-5] fdo#47644 performance regression on largish .doc #2 ...

2012-05-14 Thread Caolán McNamara
On Sun, 2012-05-13 at 00:32 +0100, Michael Meeks wrote:
>   So - I re-worked this to simplify, incrementally build the page chain
> cache which might help performance in nasty corner cases

I think it looks pretty good now anyway FWIW, I'd initially wanted to
build up an incremental chain, but knew I wouldn't get (even the simple
case) right in any reasonable length of time. Glad I didn't try given
the "busted storages, but we happen to stick to the fairly ok bits when
reading" case :-)

> It'd be worth testing any documents we know of, where previously we
> could recover some of the document content from the beginning of the
> stream, where the end was corrupt (I guess).

I don't know of any specific other examples outside of the existing
"special" ole2 cases we have in the regression tests. At least
collecting those together as cppunit tests has had some some payoff.

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [PATCH 3-5] fdo#47644 performance regression on largish .doc #2 ...

2012-05-12 Thread Michael Meeks

On Fri, 2012-05-11 at 22:22 +0100, Michael Meeks wrote:
>   Argh - and it's entirely possible that this breaks the
> CVE-2010-3454-1.doc test on -3-5 - but it's rather too late to double
> check that now; seems to pass on master though; most odd. Will poke
> Monday.

Wow - this was a -really- 'fun' problem to nail ;-) it turns out that
simply walking the fat chain pollutes the state of the streams in a way
that is extraordinarily hard to unwind; ie. even just calling the
original makePageChainCache method (or sim.) would seek beyond the end
of the stream, putting it in some un-recoverable state (or somesuch).

Anyhow - after the big chunk of life working that out, it dawned on me
that there is no need for the pagechaincache building to be slow, and
that we should do it always, and incrementally as we read. Hopefully
that'll still allow us to recover parts of word documents that are not
seekable.

So - I re-worked this to simplify, incrementally build the page chain
cache which might help performance in nasty corner cases, and also wrote
some regression tests [ which are hairy, the sot/ 'pass' documents have
some nice instances of broken FAT chains ;-].

The 'slow.doc' parses in ~4 seconds for me now; though there is some
hyper-long and painfully incomprehensible 15 second thrash after that
(with no progress bar) still ;-)

I'd like to get this reviewed, more widely tested and into -3-5 (as yet
it's not in master either, this is vs. -3-5 ;-) It'd be worth testing
any documents we know of, where previously we could recover some of the
document content from the beginning of the stream, where the end was
corrupt (I guess).

Thanks !

Michael.

-- 
michael.me...@suse.com  <><, Pseudo Engineer, itinerant idiot
>From fe0e7c2664a6fc9e3fae4404d41645cecc9e2624 Mon Sep 17 00:00:00 2001
From: Michael Meeks 
Date: Fri, 11 May 2012 20:19:21 +0100
Subject: [PATCH] sot: re-work OLE2 offset-to-page computation with a FAT
 chain cache

---
 sot/qa/cppunit/test_sot.cxx|   74 -
 sot/source/sdstor/stgstrms.cxx |  144 +---
 sot/source/sdstor/stgstrms.hxx |3 +
 3 files changed, 180 insertions(+), 41 deletions(-)

diff --git a/sot/qa/cppunit/test_sot.cxx b/sot/qa/cppunit/test_sot.cxx
index 36d0b02..0f91bdf 100644
--- a/sot/qa/cppunit/test_sot.cxx
+++ b/sot/qa/cppunit/test_sot.cxx
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 using namespace ::com::sun::star;
 
@@ -45,6 +46,11 @@ namespace
 public:
 SotTest() {}
 
+bool checkStream( const SotStorageRef &xObjStor,
+  const String &rStreamName,
+  sal_uLong nSize );
+bool checkStorage( const SotStorageRef &xObjStor );
+
 virtual bool load(const rtl::OUString &,
 const rtl::OUString &rURL, const rtl::OUString &);
 
@@ -55,12 +61,78 @@ namespace
 CPPUNIT_TEST_SUITE_END();
 };
 
+bool SotTest::checkStream( const SotStorageRef &xObjStor,
+   const String &rStreamName,
+   sal_uLong nSize )
+{
+unsigned char *pData = (unsigned char*)malloc( nSize );
+sal_uLong nReadableSize = 0;
+if( !pData )
+return true;
+
+{   // Read the data in one block
+SotStorageStreamRef xStream( xObjStor->OpenSotStream( rStreamName ) );
+xStream->Seek(0);
+sal_uLong nRemaining = xStream->GetSize() - xStream->Tell();
+
+CPPUNIT_ASSERT_MESSAGE( "check size", nRemaining == nSize );
+CPPUNIT_ASSERT_MESSAGE( "check size #2", xStream->remainingSize() == nSize );
+
+// Read as much as we can, a corrupted FAT chain can cause real grief here
+nReadableSize = xStream->Read( (void *)pData, nSize );
+//fprintf(stderr, "readable size %d vs size %d remaining %d\n", nReadableSize, nSize, nReadableSize);
+}
+{   // Read the data backwards as well
+SotStorageStreamRef xStream( xObjStor->OpenSotStream( rStreamName ) );
+for( sal_uLong i = nReadableSize; i > 0; i-- )
+{
+CPPUNIT_ASSERT_MESSAGE( "sot reading error", !xStream->GetError() );
+unsigned char c;
+xStream->Seek( i - 1 );
+CPPUNIT_ASSERT_MESSAGE( "sot storage reading byte",
+xStream->Read( &c, 1 ) == 1);
+CPPUNIT_ASSERT_MESSAGE( "mismatching data storage reading byte",
+pData[i - 1] == c );
+}
+}
+
+return true;
+}
+
+bool SotTest::checkStorage( const SotStorageRef &xObjStor )
+{
+SvStorageInfoList aInfoList;
+xObjStor->FillInfoList( &aInfoList );
+
+for( SvStorageInfoList::iterator aIt = aInfoList.begin();
+ aIt != aInfoList.end();