Re: SA 3.30 question: redundant index in bayes?
On Sat, 13 Feb 2010, Matt Kettler wrote: On 2/13/2010 11:33 AM, David Morton wrote: So token was both a primary key, and an index, which is redundant. How is that redundant? If you search for only a token, it would not be indexed, and would perform very poorly. Because it is the primary key, which is by definition, an index!!! (it is the fastest index too, so any other index on the same column will just be slower (if used at all)) PRIMARY KEY (id, token), "token" is _not_ the primary key. It is a primary key _member_, but it is not the entire primary key. David's comment is precisely correct. The PK index would not help a search on token by itself, as it is not the first member of the PK. The fact that token is never queried by itself means an index on token is not needed, but such an index is _not_ redundant. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- End users want eye candy and the "ooo's and hhh's" experience when reading mail. To them email isn't a tool, but an entertainment form. -- Steve Lake --- 9 days until George Washington's 278th Birthday
Re: SA 3.30 question: redundant index in bayes?
On 2/13/2010 11:33 AM, David Morton wrote: > > > > So token was both a primary key, and an index, which is redundant. > > How is that redundant? If you search for only a token, it would not be > indexed, and would perform very poorly. Because it is the primary key, which is by definition, an index!!! (it is the fastest index too, so any other index on the same column will just be slower (if used at all))
Re: SA 3.30 question: redundant index in bayes?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Henrik K wrote: >>> So token was both a primary key, and an index, which is redundant. >> How is that redundant? If you search for only a token, it would not be >> indexed, and would perform very poorly. > > As you didn't bother to check SpamAssassin sources, let me clarify it for > you. All the SA queries use "id=? AND token=?". If something is changed, > it's usually for a reason. But thanks for the effort anyway. My bad, I was thinking of id as an autoincrement type... not with its own semantic meaning. - -- David Morton Morton Software & Design http://www.dgrmm.net - Ruby on Rails PHP Applications Maia Mailguard http://www.maiamailguard.com- Spam management for mail servers -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFLdvEoUy30ODPkzl0RAiqSAJoDB1ARJ1vGajHwE1pdEFHCbBAhogCgidNP j86yRAfmlVfXPzPnx0n+bnk= =9eEV -END PGP SIGNATURE-
Re: SA 3.30 question: redundant index in bayes?
On Sat, Feb 13, 2010 at 10:33:08AM -0600, David Morton wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Matt Kettler wrote: > > > A quick diff of the 3.2 and 3.3 versions of these files shows this table > > was changed: > > > > > > CREATE TABLE bayes_token ( > > id int(11) NOT NULL default '0', > > token char(5) NOT NULL default '', > > spam_count int(11) NOT NULL default '0', > > ham_count int(11) NOT NULL default '0', > > atime int(11) NOT NULL default '0', > > PRIMARY KEY (id, token), > > INDEX bayes_token_idx1 (token),<- deleted > > INDEX bayes_token_idx2 (id, atime)<- renamed idx1 > > ) TYPE=MyISAM; > > > > > > > > So token was both a primary key, and an index, which is redundant. > > How is that redundant? If you search for only a token, it would not be > indexed, and would perform very poorly. As you didn't bother to check SpamAssassin sources, let me clarify it for you. All the SA queries use "id=? AND token=?". If something is changed, it's usually for a reason. But thanks for the effort anyway. https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5659
Re: SA 3.30 question: redundant index in bayes?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matt Kettler wrote: > A quick diff of the 3.2 and 3.3 versions of these files shows this table > was changed: > > > CREATE TABLE bayes_token ( > id int(11) NOT NULL default '0', > token char(5) NOT NULL default '', > spam_count int(11) NOT NULL default '0', > ham_count int(11) NOT NULL default '0', > atime int(11) NOT NULL default '0', > PRIMARY KEY (id, token), > INDEX bayes_token_idx1 (token),<- deleted > INDEX bayes_token_idx2 (id, atime)<- renamed idx1 > ) TYPE=MyISAM; > > > > So token was both a primary key, and an index, which is redundant. How is that redundant? If you search for only a token, it would not be indexed, and would perform very poorly. In section 7.4.4 of the mysql docs: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). MySQL cannot use an index if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here: SELECT * FROM tbl_name WHERE col1=val1; SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; SELECT * FROM tbl_name WHERE col2=val2; SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3; If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3). - -- David Morton Morton Software & Design http://www.dgrmm.net - Ruby on Rails PHP Applications Maia Mailguard http://www.maiamailguard.com- Spam management for mail servers -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFLdtRDUy30ODPkzl0RAhuUAKDLlkErP+nXtPQ6gfHQwOQpBw5e7wCgvn+n 1VsGlPGW6GW9GoJrwE3cTgw= =PJew -END PGP SIGNATURE-
Re: SA 3.30 question: redundant index in bayes?
On 2/12/2010 2:51 PM, Michael Scheidell wrote: > I looked at our bayes schema and at the schema in > ../docs/sql/bayes_mysql.sql and I can't find the redundant index > mentioned in the SA 3.30 upgrade/changes documents. > > did I miss something? or did I remove it years ago anyway? > A quick diff of the 3.2 and 3.3 versions of these files shows this table was changed: CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token),<- deleted INDEX bayes_token_idx2 (id, atime)<- renamed idx1 ) TYPE=MyISAM; So token was both a primary key, and an index, which is redundant.