Union Performance Question
Hi everyone, I have a question regarding the performance of UNION queries: I need to do a full-text search against a large number of rows. Is it faster to have one table with 10,000,000 text rows and perform one full-text search. Or, am I better off having 10 smaller more managable tables and performing UNION'ed full-text searches against them. A merge table seems like it would be the ideal solution, but the limitation on full-text indexing prohibits it. Thanks! -Dan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Union Performance Question
Use 10 smaller tables and perform a union. It's faster to look up in smaller tables then larger ones-generally. Additionally more of the key_buffer can be used for tables with the most hits over the tables with less hits, making the lookup sustain speed over time. -Original Message- From: Dan Salzer [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 11:46 AM To: mysql@lists.mysql.com Subject: Union Performance Question Hi everyone, I have a question regarding the performance of UNION queries: I need to do a full-text search against a large number of rows. Is it faster to have one table with 10,000,000 text rows and perform one full-text search. Or, am I better off having 10 smaller more managable tables and performing UNION'ed full-text searches against them. A merge table seems like it would be the ideal solution, but the limitation on full-text indexing prohibits it. Thanks! -Dan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Union Performance Question
Use one big table. A merge table will run the same query over all 10 tables. The key buffer is filled from the top down so if you have a key buffer that looks like this: a / \ /\ b c / \/ \ de fg Almost all queries for that index will be able to use the buffer for 'a' and 'b'. It's not until they get things very narrowed down that you have to actually hit the index files for specific leaves. Creating ten small tables creates ten duplicate 'a' and 'b' sections which polutes the key buffer. -Eric Dathan Pattishall wrote: Use 10 smaller tables and perform a union. It's faster to look up in smaller tables then larger ones-generally. Additionally more of the key_buffer can be used for tables with the most hits over the tables with less hits, making the lookup sustain speed over time. -Original Message- From: Dan Salzer [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 11:46 AM To: mysql@lists.mysql.com Subject: Union Performance Question Hi everyone, I have a question regarding the performance of UNION queries: I need to do a full-text search against a large number of rows. Is it faster to have one table with 10,000,000 text rows and perform one full-text search. Or, am I better off having 10 smaller more managable tables and performing UNION'ed full-text searches against them. A merge table seems like it would be the ideal solution, but the limitation on full-text indexing prohibits it. Thanks! -Dan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Union Performance Question
Use one big table. A merge table will run the same query over all 10 tables. The key buffer is filled from the top down so He is using a fulltext index he can't use merge tables. If he where to UNION across the tables being used assuming he uses the tables that only have the data he would get a good boost in performance. The performance comes from a smaller in time lock on the table and only data that is being accessed the most would stay in the buffer. He has to pick a good hash such as date or country code for splitting tables on. To explain why let's look at the key buffer structure. Having multiple tables do not pollute the buffer this is why: mySQL pulls OS MYI file blocks then puts the frequently used blocks in the buffer. The index file itself stores the data in a B-Tree+ (or R-Tree) so a smaller index file means more of it can exist in the keybuffer. Since an index file is associated with a table the most frequently used tables' indexes will have more of the key buffer-that assumed: if the application hashes the table correctly then a boost in performance can be gained by using more memory for index blocks that have more pertinent data then not. If your using INNODB (no FULLTEXT INDEX option) use 1 big table. if you have a key buffer that looks like this: a / \ /\ b c / \/ \ de fg Almost all queries for that index will be able to use the buffer for 'a' and 'b'. It's not until they get things very narrowed down that you have to actually hit the index files for specific leaves. Creating ten small tables creates ten duplicate 'a' and 'b' sections which polutes the key buffer. -Eric Dathan Pattishall wrote: Use 10 smaller tables and perform a union. It's faster to look up in smaller tables then larger ones-generally. Additionally more of the key_buffer can be used for tables with the most hits over the tables with less hits, making the lookup sustain speed over time. -Original Message- From: Dan Salzer [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 11:46 AM To: mysql@lists.mysql.com Subject: Union Performance Question Hi everyone, I have a question regarding the performance of UNION queries: I need to do a full-text search against a large number of rows. Is it faster to have one table with 10,000,000 text rows and perform one full-text search. Or, am I better off having 10 smaller more managable tables and performing UNION'ed full-text searches against them. A merge table seems like it would be the ideal solution, but the limitation on full-text indexing prohibits it. Thanks! -Dan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Union Performance Question
Most helpfull! Thanks! This brings me to my next question I have a table that looks like this: CREATE TABLE `Article_Search` ( `ArticleID` int(11) NOT NULL default '0', `Content` text NOT NULL, PRIMARY KEY (`ArticleID`), FULLTEXT KEY `Content` (`Content`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 This table has several million rows, but I only want to search a subset of the table. IE: SELECT * FROM Article_Search WHERE MATCH(Content) AGAINST('rubber duckies' IN BOOLEAN MODE) AND ArticleID IN (100, 23, 223, 98, 4018, 1452, 91) The reason I'm specifying a set of ArticleIDs is that I know any hits are going to be within those articles. So the presence of the IN() clause is purely there for performance. However, an explain on this Statement shows that it is using the Full-Text index. Is mysql text-searching the entire table under the hood, or does it use the PK to reduce the dataset before the text-search. Thanks again! -Dan On 5/11/05, Dathan Pattishall [EMAIL PROTECTED] wrote: Use one big table. A merge table will run the same query over all 10 tables. The key buffer is filled from the top down so He is using a fulltext index he can't use merge tables. If he where to UNION across the tables being used assuming he uses the tables that only have the data he would get a good boost in performance. The performance comes from a smaller in time lock on the table and only data that is being accessed the most would stay in the buffer. He has to pick a good hash such as date or country code for splitting tables on. To explain why let's look at the key buffer structure. Having multiple tables do not pollute the buffer this is why: mySQL pulls OS MYI file blocks then puts the frequently used blocks in the buffer. The index file itself stores the data in a B-Tree+ (or R-Tree) so a smaller index file means more of it can exist in the keybuffer. Since an index file is associated with a table the most frequently used tables' indexes will have more of the key buffer-that assumed: if the application hashes the table correctly then a boost in performance can be gained by using more memory for index blocks that have more pertinent data then not. If your using INNODB (no FULLTEXT INDEX option) use 1 big table. if you have a key buffer that looks like this: a / \ / \ b c / \ / \ d e f g Almost all queries for that index will be able to use the buffer for 'a' and 'b'. It's not until they get things very narrowed down that you have to actually hit the index files for specific leaves. Creating ten small tables creates ten duplicate 'a' and 'b' sections which polutes the key buffer. -Eric Dathan Pattishall wrote: Use 10 smaller tables and perform a union. It's faster to look up in smaller tables then larger ones-generally. Additionally more of the key_buffer can be used for tables with the most hits over the tables with less hits, making the lookup sustain speed over time. -Original Message- From: Dan Salzer [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 11:46 AM To: mysql@lists.mysql.com Subject: Union Performance Question Hi everyone, I have a question regarding the performance of UNION queries: I need to do a full-text search against a large number of rows. Is it faster to have one table with 10,000,000 text rows and perform one full-text search. Or, am I better off having 10 smaller more managable tables and performing UNION'ed full-text searches against them. A merge table seems like it would be the ideal solution, but the limitation on full-text indexing prohibits it. Thanks! -Dan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]