Andreas,

I gained some experience putting 15'000 documents into TEXT columns and
indexing them. It's not quite 15gigs, but your amount of data will be
reduced when you go from *.doc to pure ASCII. My table size is:

-rw-rw----    1 mysql    daemon   187621564 Jan  8 20:02 plaintext.MYD
-rw-rw----    1 mysql    daemon    92874752 Jan  8 20:04 plaintext.MYI
-rw-rw----    1 mysql    daemon        8648 Jan  8 20:02 plaintext.frm

Insert texts into the table and then later create the fulltext index with
the ALTER TABLE statement. Having lots of RAM available speeds up index
creation. It may be a lengthy process otherwise. And you have to increase
some buffers.

My model consists of the decisions from the Swiss Federal Court since
1954. Search runs are quite fast and I am happy with the results. BOOLEAN
MODE still appears buggy but this situation may change shortly. Mail me
off-list if you like.

Thomas


On Wed, 20 Mar 2002, Andreas Dau wrote:

> > >>  >I have to plan a content retrieval system and am thinking of
> using
> > >mysql
> > >>  >cause it's always been very reliable and convenient for my needs.
> > >>  >Now, I do not have any experiences with databases of this size.
> > >>  >
> > >>  >The situation is as follows:
> > >>  >We have round about 15gig of user documents (mainly MS Office
> > >documents
> > >>  >such as doc or ppt).
> > >>
> > >>  If you're storing those in native format, then they're not text
> > >documents
> > >>  and FULLTEXT searches are unlikely to benefit you.  At least, I
> > >wouldn't
> > >>  guess so.
> > >
> > >Oh no, sorry I was not being clear. Of course I am thinking of a
> script
> > >that stores the _real_ text information into the field, which in
> > >addition will lower down the amount of data enormously since MS
> Office
> > >documents have a very poor snr *g*
> > 
> > In that case, I'd say the best way to find out how well it'll work is
> > to try it.  Go for it!
> 
> Well you're perfectly right that would be the best way. Unfortunately
> that is, as usual impossible since I have no comparable testing
> environment. Even If I'd take the effort and put all the documents into
> my local server , it's nowhere near to a high performance server and
> furthermore I do not have 50 people sitting here in my development LAN
> accessing all the different imaginable services. So I'd be really happy
> to get some _experience_ report. Is that understandable?
> 
> > >>  >My question is: can this be done with MySQL? What hardware
> (server)
> > >>  >would be needed to get the results in a time anywhere near
> resonable?
> > >>  >And, do the clients need to be tweaked or would it be acceptable
> to
> > >use
> > >>  >a Browser for it (php on an Apache)?
> > 
> > The client shouldn't matter all that much.  The real work is being
> done
> > on the server side -- unless you're planning to send several megabytes
> > of search result to the client each time. :-)
> 
> lol, naw *g*


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to