Bonnetot Jean-Daniel wrote:
Why would human created clear text non-tokenized white list entries
grow big?

Maybe my wording was/is not correct. I don't mean that a human
created list would grow big. I mean that human readable (clear text)
white list entries use more space then a hashed white list entry
(tokenized).
I'm not an expert but I don't understand why it will use more space
than token. Token are now char(20), email address would be max
char(100) but usually it's not longer than char(60).
For a long address : # echo "[EMAIL PROTECTED]" | wc -c
44



Just to crarify on what "should" be used, according to smtp rfc:

--- snip ---

           user

              The maximum total length of a user name is 64 characters.

           domain

              The maximum total length of a domain name or number is 64

              characters.


--- snip ---


char(100) is bigger than char(20) but I don't think that will be a
problem for the load of DSPAM. This will take a little more space but is
it very bad ?

So, if this is a problem, we can imagine truncate the address to stay
with a little space.
# echo "[EMAIL PROTECTED]" | cut -c0-40
[EMAIL PROTECTED]

Or, are you saying that human created clear text white list entries would be tokenized entries?

No. They could but the problem is that tokenized entries can not easy
be converted back to a human readable format.


In that case I don't see why the entries would need to be tokenized.
Less space usage, smaller indizes, faster processing, etc...
faster processing...
Without computing tokens, you recover time to match (truncated) entries
in DB ;)

In any case how big is big?

Depends on the number of entries. For example having 1'000 white list
entries in tokenized form could use 1'000 time the size of an integer
or long. Having the same 1'000 white list entries in clear text would
use more storage and memory space.

I"m not familar with CSS.
It's well documented in CRM114. But that's not the issue. The main
issue I see is that we as the DSPAM community can not just ignore the
fact that we have more then one storage engine used in DSPAM. Adding
a new and vital function to just a bunch of the engines is not very
consistent. I know that we do already have such stuff (for example
the preference extension) but if we can avoid splitting
functionality, then we should aim for that goal (that's my own
personal opinion).


Perhaps for whitelisting there could be a choice (a dspam.conf
option?) to enable/seek per user whitelist information from file or
DB.  Then document both in the dspam.conf and docs that only file
can be used for whitelist managment if CSS is the backend DB in use.

This sounds like a plan :)

--
 .`'`.   BONNETOT Jean-Daniel
: ': : `. ` .` PRIVIANET
  `'`    Sys & Net Admin








--
ci.fct.unl.pt:~# cat .signature

Hugo Monteiro
Email    : [EMAIL PROTECTED]
Telefone : +351 212948300 Ext.15307

Centro de Informática
Faculdade de Ciências e Tecnologia da
                   Universidade Nova de Lisboa
Quinta da Torre   2829-516 Caparica   Portugal
Telefone: +351 212948596   Fax: +351 212948548
www.ci.fct.unl.pt             [EMAIL PROTECTED]

ci.fct.unl.pt:~# _


!DSPAM:1011,48736cd1150924274451099!


Reply via email to