* Pascal Francq > I have a question. I need to store documents (>100.000) and the word > containing in them. > > They are two way to do the job : > 1°) The documents are stored in a table 'docs' and identified by > an 'docid'. The words are then stored in a table 'docsbywords' > where each pair (docid, word) is stored. > 2°) The documents are stored in a table 'docs' and identified by > an 'docid'. But, for each document a table is created 'doc1bywords', > 'doc2bywords', ... that contains only the words of a given document. > > In the first solution, two tables with one of them can be very > large. For the second solution, many tables but with no large sizes. > Is one of the solutions better than another with regards to the way > MySQL handles tables and row (in fact, for practical reasons, I > prefer the second solution).
3) The documents are stored in a table 'docs' and identified by an 'docid'. The words are stored in a table 'words', one row for each unique word, identified by 'wordid'. A third table 'worddoc' contains the columns 'docid' and 'wordid', and two unique compound indexes are defined, one on ('wordid','docid') and one ('docid','wordid') There will be many rows in 'worddoc', but each row will be small. -- Roger -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]