Andreas, I gained some experience putting 15'000 documents into TEXT columns and indexing them. It's not quite 15gigs, but your amount of data will be reduced when you go from *.doc to pure ASCII. My table size is:
-rw-rw---- 1 mysql daemon 187621564 Jan 8 20:02 plaintext.MYD -rw-rw---- 1 mysql daemon 92874752 Jan 8 20:04 plaintext.MYI -rw-rw---- 1 mysql daemon 8648 Jan 8 20:02 plaintext.frm Insert texts into the table and then later create the fulltext index with the ALTER TABLE statement. Having lots of RAM available speeds up index creation. It may be a lengthy process otherwise. And you have to increase some buffers. My model consists of the decisions from the Swiss Federal Court since 1954. Search runs are quite fast and I am happy with the results. BOOLEAN MODE still appears buggy but this situation may change shortly. Mail me off-list if you like. Thomas On Wed, 20 Mar 2002, Andreas Dau wrote: > > >> >I have to plan a content retrieval system and am thinking of > using > > >mysql > > >> >cause it's always been very reliable and convenient for my needs. > > >> >Now, I do not have any experiences with databases of this size. > > >> > > > >> >The situation is as follows: > > >> >We have round about 15gig of user documents (mainly MS Office > > >documents > > >> >such as doc or ppt). > > >> > > >> If you're storing those in native format, then they're not text > > >documents > > >> and FULLTEXT searches are unlikely to benefit you. At least, I > > >wouldn't > > >> guess so. > > > > > >Oh no, sorry I was not being clear. Of course I am thinking of a > script > > >that stores the _real_ text information into the field, which in > > >addition will lower down the amount of data enormously since MS > Office > > >documents have a very poor snr *g* > > > > In that case, I'd say the best way to find out how well it'll work is > > to try it. Go for it! > > Well you're perfectly right that would be the best way. Unfortunately > that is, as usual impossible since I have no comparable testing > environment. Even If I'd take the effort and put all the documents into > my local server , it's nowhere near to a high performance server and > furthermore I do not have 50 people sitting here in my development LAN > accessing all the different imaginable services. So I'd be really happy > to get some _experience_ report. Is that understandable? > > > >> >My question is: can this be done with MySQL? What hardware > (server) > > >> >would be needed to get the results in a time anywhere near > resonable? > > >> >And, do the clients need to be tweaked or would it be acceptable > to > > >use > > >> >a Browser for it (php on an Apache)? > > > > The client shouldn't matter all that much. The real work is being > done > > on the server side -- unless you're planning to send several megabytes > > of search result to the client each time. :-) > > lol, naw *g* --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php