Re: SA 3.30 question: redundant index in bayes?

2010-02-13 Thread David Morton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Matt Kettler wrote:

 A quick diff of the 3.2 and 3.3 versions of these files shows this table
 was changed:
 
 
 CREATE TABLE bayes_token (
   id int(11) NOT NULL default '0',
   token char(5) NOT NULL default '',
   spam_count int(11) NOT NULL default '0',
   ham_count int(11) NOT NULL default '0',
   atime int(11) NOT NULL default '0',
   PRIMARY KEY  (id, token),
   INDEX bayes_token_idx1 (token),- deleted
   INDEX bayes_token_idx2 (id, atime)- renamed idx1
 ) TYPE=MyISAM;
 
 
 
 So token was both a primary key, and an index, which is redundant.

How is that redundant?  If you search for only a token, it would not be
indexed, and would perform very poorly.

In section 7.4.4 of the mysql docs:
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

 If the table has a multiple-column index, any leftmost prefix of the
index can be used by the optimizer to find rows. For example, if you
have a three-column index on (col1, col2, col3), you have indexed search
capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use an index if the columns do not form a leftmost prefix
of the index. Suppose that you have the SELECT statements shown here:

SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first two queries use
the index. The third and fourth queries do involve indexed columns, but
(col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).





- --
David Morton morto...@dgrmm.net

Morton Software  Design  http://www.dgrmm.net - Ruby on Rails
 PHP Applications
Maia Mailguard http://www.maiamailguard.com- Spam management
 for mail servers
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFLdtRDUy30ODPkzl0RAhuUAKDLlkErP+nXtPQ6gfHQwOQpBw5e7wCgvn+n
1VsGlPGW6GW9GoJrwE3cTgw=
=PJew
-END PGP SIGNATURE-


Re: SA 3.30 question: redundant index in bayes?

2010-02-13 Thread Henrik K
On Sat, Feb 13, 2010 at 10:33:08AM -0600, David Morton wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Matt Kettler wrote:
 
  A quick diff of the 3.2 and 3.3 versions of these files shows this table
  was changed:
  
  
  CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY  (id, token),
INDEX bayes_token_idx1 (token),- deleted
INDEX bayes_token_idx2 (id, atime)- renamed idx1
  ) TYPE=MyISAM;
  
  
  
  So token was both a primary key, and an index, which is redundant.
 
 How is that redundant?  If you search for only a token, it would not be
 indexed, and would perform very poorly.

As you didn't bother to check SpamAssassin sources, let me clarify it for
you. All the SA queries use id=? AND token=?. If something is changed,
it's usually for a reason. But thanks for the effort anyway.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5659



Re: SA 3.30 question: redundant index in bayes?

2010-02-13 Thread David Morton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Henrik K wrote:
 So token was both a primary key, and an index, which is redundant.
 How is that redundant?  If you search for only a token, it would not be
 indexed, and would perform very poorly.
 
 As you didn't bother to check SpamAssassin sources, let me clarify it for
 you. All the SA queries use id=? AND token=?. If something is changed,
 it's usually for a reason. But thanks for the effort anyway.

My bad, I was thinking of id as an autoincrement type... not with its
own semantic meaning.

- --
David Morton morto...@dgrmm.net

Morton Software  Design  http://www.dgrmm.net - Ruby on Rails
 PHP Applications
Maia Mailguard http://www.maiamailguard.com- Spam management
 for mail servers
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFLdvEoUy30ODPkzl0RAiqSAJoDB1ARJ1vGajHwE1pdEFHCbBAhogCgidNP
j86yRAfmlVfXPzPnx0n+bnk=
=9eEV
-END PGP SIGNATURE-


Re: SA 3.30 question: redundant index in bayes?

2010-02-13 Thread Matt Kettler
On 2/13/2010 11:33 AM, David Morton wrote:


  So token was both a primary key, and an index, which is redundant.

 How is that redundant?  If you search for only a token, it would not be
 indexed, and would perform very poorly.

Because it is the primary key, which is by definition, an index!!! (it
is the fastest index too, so any other index on the same column will
just be slower (if used at all))






Re: SA 3.30 question: redundant index in bayes?

2010-02-13 Thread John Hardin

On Sat, 13 Feb 2010, Matt Kettler wrote:


On 2/13/2010 11:33 AM, David Morton wrote:



So token was both a primary key, and an index, which is redundant.


How is that redundant?  If you search for only a token, it would not be
indexed, and would perform very poorly.


Because it is the primary key, which is by definition, an index!!! (it
is the fastest index too, so any other index on the same column will
just be slower (if used at all))



  PRIMARY KEY  (id, token),


token is _not_ the primary key. It is a primary key _member_, but it is 
not the entire primary key.


David's comment is precisely correct. The PK index would not help a search 
on token by itself, as it is not the first member of the PK.


The fact that token is never queried by itself means an index on token is 
not needed, but such an index is _not_ redundant.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  End users want eye candy and the ooo's and hhh's experience
  when reading mail. To them email isn't a tool, but an entertainment
  form. -- Steve Lake
---
 9 days until George Washington's 278th Birthday


SA 3.30 question: redundant index in bayes?

2010-02-12 Thread Michael Scheidell
I looked at our bayes schema and at the schema in 
../docs/sql/bayes_mysql.sql  and I can't find the redundant index 
mentioned in the SA 3.30 upgrade/changes documents.


did I miss something? or did I remove it years ago anyway?

--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best Anti-Spam Product 2008, Network Products Guide
   * King of Spam Filters, SC Magazine 2008

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
__  


Re: SA 3.30 question: redundant index in bayes?

2010-02-12 Thread Matt Kettler
On 2/12/2010 2:51 PM, Michael Scheidell wrote:
 I looked at our bayes schema and at the schema in
 ../docs/sql/bayes_mysql.sql  and I can't find the redundant index
 mentioned in the SA 3.30 upgrade/changes documents.

 did I miss something? or did I remove it years ago anyway?


A quick diff of the 3.2 and 3.3 versions of these files shows this table
was changed:


CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),- deleted
  INDEX bayes_token_idx2 (id, atime)- renamed idx1
) TYPE=MyISAM;



So token was both a primary key, and an index, which is redundant.