R: Auto_increment vs SERIAL key types

2006-10-11 Thread Giampaolo Tomassoni
  Of course, the underlying sql engine has to support views 
 (5.0, but I use 4.1)
  and, most important, updates to a view. Maybe I'm wrong, but 
  this is something that mysql doesn't do. Besides, that's one 
  of the reasons for which I prefer much more postgresql.
  
 But postgresql doen't support replication, does it?
 Oh, there seem to be a bunch of add on products, but that's not the same
 as postgres supporting it.

Slony1 is basicly the official replicating engine.

Anyway, adding a primary-key, SERIAL column to the per-server tables you may 
even do it yourself using a shell and a sql script: the table setup is made 
in such a way that you can't get conflicts on the per-server tables...

giampaolo



Auto_increment vs SERIAL key types

2006-10-10 Thread Michael Scheidell
I am experimenting with mysql replication, and have done some research
on key collisions in the case of a 'load balancing' situation (live sql
servers running on each amavisd server), using either same mx weight, or
VRRP/CARP, heartbeat, virtual ip type setups.  'random' smtp connections
could hit each server, and each server has a local mysql DB, in a dual
master/slave replication setup. (updates to either db propagate to the
other, works fine, creates lots of traffic, so maybe use a second nic
and an xover cable..)

My concern is over use of SERIAL keys in amavisd-new tables, vs
AUTO_INCREMENT keys.
(are SERIAL keys an alias for AUTO_INCREMENT? Are SERIAL keys safe in
replication situations?)

I have seen documentation saying that 'auto_increment' works as expected
in replication situations, but can't find any information on SERIAL
keys.

http://www.weberdev.com/Manuals/MySQL3.X_4.X/replication.html#replicatio
n-features

Another issue may be AWL files, (I suppose a spamassassin question
also?).  Every 'new' ip/email incoming will create a new  PRIMARY KEY
(username,email,ip).  If two connections, one on each box, first one
wins, replication stops and you need to manually issue a bunch of
commands to skip (two?) transactions and restart slave.

 --slave-skip-errors=[err_code1,err_code2,... | all]

Normally, replication stops when an error occurs, which gives you the
opportunity to resolve the inconsistency in the data manually. This
option tells the slave SQL thread to continue replication when a
statement returns any of the errors listed in the option value.

Do not use this option unless you fully understand why you are getting
errors. If there are no bugs in your replication setup and client
programs, and no bugs in MySQL itself, an error that stops replication
should never occur. Indiscriminate use of this option results in slaves
becoming hopelessly out of sync with the master, with you having no idea
why this has occurred

I am using Innodb DB type on Freebsd5, and mysql 4.1.20ish.


-- 
Michael Scheidell, CTO
561-999-5000, ext 1131
SECNAP Network Security Corporation
Keep up to date with latest information on IT security: Real time
security alerts: http://www.secnap.com/news



Re: Auto_increment vs SERIAL key types

2006-10-10 Thread Michael Scheidell
Michael Scheidell wrote:
 I am experimenting with mysql replication, and have done some research
 on key collisions in the case of a 'load balancing' situation (live sql
 servers running on each amavisd server), using either same mx weight, or
 VRRP/CARP, heartbeat, virtual ip type setups.  'random' smtp connections
 could hit each server, and each server has a local mysql DB, in a dual
 master/slave replication setup. (updates to either db propagate to the
 other, works fine, creates lots of traffic, so maybe use a second nic
 and an xover cable..)



 Another issue may be AWL files, (I suppose a spamassassin question
 also?).  Every 'new' ip/email incoming will create a new  PRIMARY KEY
 (username,email,ip).  If two connections, one on each box, first one
 wins, replication stops and you need to manually issue a bunch of
 commands to skip (two?) transactions and restart slave.

   
and I suppose the Bayesian files also:

duplicate key exists  - I could go in and stop the slave, delete rows
similar to the token, start the slave again, and usually it would move
on but due to not being able to correctly copy and paste the binary data
to search with, sometimes 0 rows would be deleted and sometimes hundreds
would be deleted. This probably had some effect on the quality of the
filtering so I gave up on this approach.

http://ckdake.com/node/64


-- 
Michael Scheidell, CTO
SECNAP Network Security / www.secnap.com
[EMAIL PROTECTED]  / 1+561-999-5000, x 1131



Re: Auto_increment vs SERIAL key types

2006-10-10 Thread SM

At 06:14 10-10-2006, Michael Scheidell wrote:

I am experimenting with mysql replication, and have done some research
on key collisions in the case of a 'load balancing' situation (live sql


[snip]



My concern is over use of SERIAL keys in amavisd-new tables, vs
AUTO_INCREMENT keys.
(are SERIAL keys an alias for AUTO_INCREMENT? Are SERIAL keys safe in
replication situations?)


It's an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE.

See auto_increment_increment and auto_increment_offset (MySQL 5.x).

Regards,
-sm 



R: Auto_increment vs SERIAL key types

2006-10-10 Thread Giampaolo Tomassoni
 Another issue may be AWL files, (I suppose a spamassassin question
 also?).  Every 'new' ip/email incoming will create a new  PRIMARY KEY
 (username,email,ip).  If two connections, one on each box, first one
 wins, replication stops and you need to manually issue a bunch of
 commands to skip (two?) transactions and restart slave.

To my opinion, the best way to implement awl is to have a table for each server 
which is basicly one-way replicated (from the only originating server to the 
others in the cluster). The table is to be made up of the fields timestamp, 
username, email, ip, and score. Please note I sayd just score, not count + 
totscore.

Then, the database may offer a view which merges the tables replicated from the 
various servers (the one managed by the server and the ones managed by the 
other servers) in such a way that spamassassin may simply access it like a 
standard awl table. Ie, something like:

select username, email, ip, count(*) as count, sum(score) as totscore
from (
select username, email, ip, score from awl0
union all select username, email, ip, score from awl1
...
union all select username, email, ip, score from awlN
)
group by username, mail, ip

The view should be made in such a way that an insert or an update into it would 
automatically trigger an insert in the awl table managed by the server.

Of course, the underlying sql engine has to support views and, most important, 
updates to a view. Maybe I'm wrong, but this is something that mysql doesn't 
do. Besides, that's one of the reasons for which I prefer much more postgresql.

You may see that the timestamp field is defined but never used. The idea is 
that the timestamp field is meant to record the time at which a new entry 
entered into the database. This way one may also implement some methods to 
delete stale entries. Ie.: suppose a source (email+ip pair) was used to send 
mostly ham and it did does for, say, one year. It may have reached a very high 
totscore and count. Well, now suppose your reliable source started sending a 
lot of spam. Would you like to have to wait a month or so before its 
whitelistening score would start to lower enough to allow the spam detector not 
to pass that stuff? Well, no. One may, in example, have a sql script run, say, 
hourly from a cron job which deletes awl entries older than, say, three months.

Do you like it?

---
Giampaolo Tomassoni - IT Consultant
Piazza VIII Aprile 1948, 4
I-53044 Chiusi (SI) - Italy
Ph: +39-0578-21100



RE: Auto_increment vs SERIAL key types

2006-10-10 Thread Michael Scheidell

 -Original Message-
 From: Giampaolo Tomassoni [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 10, 2006 3:25 PM
 To: Michael Scheidell; SpamAssassin Users List
 Subject: R: Auto_increment vs SERIAL key types
 
 Of course, the underlying sql engine has to support views 
(5.0, but I use 4.1)
 and, most important, updates to a view. Maybe I'm wrong, but 
 this is something that mysql doesn't do. Besides, that's one 
 of the reasons for which I prefer much more postgresql.
 
But postgresql doen't support replication, does it?
Oh, there seem to be a bunch of add on products, but that's not the same
as postgres supporting it.





RE: Auto_increment vs SERIAL key types

2006-10-10 Thread Michael Scheidell

 -Original Message-
 From: SM [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 10, 2006 2:08 PM
 To: SpamAssassin Users List
 Subject: Re: Auto_increment vs SERIAL key types
 
 
 At 06:14 10-10-2006, Michael Scheidell wrote:
 I am experimenting with mysql replication, and have done 
 some research 
 on key collisions in the case of a 'load balancing' 
 situation (live sql
 
 [snip]
 
 
 My concern is over use of SERIAL keys in amavisd-new tables, vs 
 AUTO_INCREMENT keys. (are SERIAL keys an alias for 
 AUTO_INCREMENT? Are 
 SERIAL keys safe in replication situations?)
 
 It's an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE.

Looks like with the bayes and awl collisions PROBABAL with live
replication, its not such a great idea.
(neither bayes now awl use serial.  Other scheams may work, maybe with
views in mysql 5,  maybe create an underlying table with enough columns
that replication won't break, and put in a view that SA wants to use..
Or hack SA?

Maybe auto replicate the users preferences tables only?



 
 See auto_increment_increment and auto_increment_offset (MySQL 5.x).
 
 Regards,
 -sm