On Freitag 06 Februar 2009 Paul J Stevens wrote:
> Interesting idea. VARCHAR for mysql can be 0xffff (65535) bytes wide.
> And that is wide enough for me :-) But alas. At least on mysql-5.0
> you can't create a full width index on such wide fields. The index
> only covers the first 255 bytes.
Hm, shit. Didn't think about index length limits.
> > ERROR: length for type varchar cannot exceed 10485760
>
> That's insane. Can you create a full index?
I believe a single index must fit into PAGESIZE, which is 4KB and can
only be changed at compile time (which isn't recommended and of course
nobody would do that, including me).
> > Could be OK, but please let's look if the 10MB strlen of PostgreSQL
> > are also possible in the other supported DBs. Or at least, if the
> > max possible would be long enough for a headervalue. What's the max
> > here anyway? 4KB? 64KB? Couldn't we just cut the headervalue at
> > 4KB? After all, this is only used for searching, right?
>
> Effectively the max width is 255. But you're right. It's *only* used
> for searching. So personally, I'd be satisfied with storing just the
> 255 chars. But I don't mean to be a lazy bum.
I did a quick search:
select headername,headername_id,count(1) from dbmail_headervalue v join
dbmail_headername n on (v.headername_id=n.id) where
length(headervalue)>255 group by headername,headername_id order by
count;
Most of those headervalues > 255 are bullshit anyway (in the sense of:
will probably never be searched), but there are also some TO:, CC: ,
BCC:, Subject: Headers. A search should possibly be able to find that.
Is there any chance to make a difference, lets say, the client says
"search all mails where 'karl may' is mentioned". If he does that, we
only search via the index. But if he says "search all headers where
'karl may' is mentioned", we search the full headervalues? That would be
a nice compromise.
*Side note*:
Aren't most searches of the form "%searchstring%"? If yes,
the index is useless, as you can only search "searchstring%" to use the
index. And if the index can't be used, it could be dropped altogether. I
did some searches with Kmail, the string I searched for shows what I
told Kmail to search (ZMI_*, e.g. ZMI_SUBJECT searched for subject)
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON
m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id
JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr =
3324 AND status IN (0,1) AND headername ILIKE 'To' AND
SUBSTRING(headervalue,0,255) ILIKE '%ZMI_ALL_RECIPIENT%' ORDER BY
message_idnr
(the same again but instead 'To' 'Cc', and then a 3rd time with 'Bcc')
SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
ON p.id = m.physmessage_id WHERE mailbox_idnr = 3324 AND status IN (0,1
) AND k.is_header = '0' GROUP BY m.message_idnr,k.messageblk HAVING
ENCODE(k.messageblk::bytea,'escape') LIKE '%ZMI_MessageContent%'
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON
m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id
JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr =
3324 AND status IN (0,1) AND headername ILIKE 'Subject' AND
SUBSTRING(headervalue,0,255) ILIKE '%ZMI_SUBJECT%' ORDER BY message_idnr
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON
m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id
JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr =
3324 AND status IN (0,1) AND headername ILIKE 'From' AND
SUBSTRING(headervalue,0,255) ILIKE '%ZMI_FROM%' ORDER BY message_idnr
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON
m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id
JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr =
3324 AND status IN (0,1) AND headername ILIKE 'List-Id' AND
SUBSTRING(headervalue,0,255) ILIKE '%ZMI_LIST-ID%' ORDER BY message_idnr
SELECT m.message_idnr, v.headervalue, k.messageblk FROM
dbmail_messageblks k JOIN dbmail_physmessage p ON k.physmessage_id=p.id
JOIN dbmail_messages m ON m.physmessage_id=p.id JOIN dbmail_headervalue
v ON v.physmessage_id=p.id WHERE m.mailbox_idnr=3324 AND m.status in
(0,1) HAVING SUBSTRING(v.headervalue,0,255) ILIKE '%ZMI_FullMessage%' OR
k.messageblk ILIKE '%ZMI_FullMessage%'
You see? I found 3 things here:
1) The last SELECT produced an error on PostgreSQL 8.3, which I will
report in a different message
2) Every single search was of the form "%searchtext%", so no index could
be used. I mean, in dbmail 2.2 the index
»dbmail_headervalue_3« btree ("substring"(headervalue, 0, 255))
was never actually used and is therefore useless. And I guess 99% of all
searches will be performed with "where X contains Y", and that's why it
will have to be searched as X ILIKE %Y%. We could drop the index and the
speed wil be increased, as a never used index only costs performance.
Or is there any client that performs a search like Y%? I guess no. At
least, when I search for messages I always search "where X contains Y"
and not "where X begins with Y".
3) Searches are performed only on substring(headervalue,0,255). The
whole "headervalue" is never actually searched, and nobody seems to have
found that. This means either
a) people don't search in such long headervalues
or
b) people were searching but not finding messages and wondering, still
they didn't report because they can't believe the computer is wrong.
mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev