On 05/30/2013 02:51 PM, Ehresman,David E. wrote:

> We use Docave.  IBM used to resale this as TSM for Sharepoint.  They
> are no longer doing that but you can still get the same functionality
> directly from Docave.


M.o.l.a.s.s.e.s.

Seriously.

When it takes hours and hours and hours AND HOURS, that's not a bug,
that's how it works.

As far as I can tell, it goes like this:

They started with a big website.

To be efficient, they wanted it inside of a database, so they put it in
a database.

They wanted to keep lots and lots of documents in the website
(database), so they put blobs in the database.

Lots of blobs are hard to organize, so they wanted a familliar
organizational schema.  People are familiar with a filesystem, so they
made it look sort of like a file system.

Sharepoint is well optimized for seamless integration with the Win
desktop, and that mode of utilization is explicitly encouraged in lots
of places, so e.g. your project notes and stuff all go in "website"
directories.  Including rudimentary version control.  So the blob count
in the database skyrockets.

access control metadata ramifies, so you've got a metadata vocabulary
that is more complex and nuanced than just the NTFS universe:  your
access control includes opinions about the outside world, too.  (mostly
"NO!", but you've got to write that down.).


... All of this may be semi-obvious, but the point of restating it is to
bring to the forefront of your mind that, from the back-end view,
Sharepoint has a great deal in common with a complex, customized,
locally-hacked-upon filesystem implementation.

So, what DocAve does, is it walks that filesystem and does an
incremental.  But instead of reading something optimized, down to the
block level, for rapid access, it's doing a series of database accesses.

Think about how long ls -l would take if every file metadata read
required two or three index scans and their indicated block retrievals?
 Ick.


... Our sharepoint guy, Joe Gasper (Hi, Joe!)  thinks I'm not talking
too much bullshit with this description, and he adds that exporting the
BLOBs is a critical performance enhancement for DR, and substantial for
normal access, too.  Well tuned, he thinks 95% reduction in the SQL
database size is readily achievable.  At that rate, you can fit the
remaining 5% on SSD...


- Allen S. Rout

Reply via email to