Michael,
It didn't get lost. I'm eagerly awaiting patches suitable for merging :-)
Michael Monnerie wrote:
> Did this message get lost? I don't know as I didn't get any answer. And
> it could be valuable for performance & storage used.
>
> On Montag 09 Februar 2009 Michael Monnerie wrote:
>> On Freitag 06 Februar 2009 Paul J Stevens wrote:
>>> Interesting idea. VARCHAR for mysql can be 0xffff (65535) bytes
>>> wide. And that is wide enough for me :-) But alas. At least on
>>> mysql-5.0 you can't create a full width index on such wide fields.
>>> The index only covers the first 255 bytes.
>> Hm, shit. Didn't think about index length limits.
>>
>>>> ERROR: length for type varchar cannot exceed 10485760
>>> That's insane. Can you create a full index?
>> I believe a single index must fit into PAGESIZE, which is 4KB and can
>> only be changed at compile time (which isn't recommended and of
>> course nobody would do that, including me).
>>
>>>> Could be OK, but please let's look if the 10MB strlen of
>>>> PostgreSQL are also possible in the other supported DBs. Or at
>>>> least, if the max possible would be long enough for a
>>>> headervalue. What's the max here anyway? 4KB? 64KB? Couldn't we
>>>> just cut the headervalue at 4KB? After all, this is only used for
>>>> searching, right?
>>> Effectively the max width is 255. But you're right. It's *only*
>>> used for searching. So personally, I'd be satisfied with storing
>>> just the 255 chars. But I don't mean to be a lazy bum.
>> I did a quick search:
>> select headername,headername_id,count(1) from dbmail_headervalue v
>> join dbmail_headername n on (v.headername_id=n.id) where
>> length(headervalue)>255 group by headername,headername_id order by
>> count;
>>
>> Most of those headervalues > 255 are bullshit anyway (in the sense
>> of: will probably never be searched), but there are also some TO:,
>> CC: , BCC:, Subject: Headers. A search should possibly be able to
>> find that. Is there any chance to make a difference, lets say, the
>> client says "search all mails where 'karl may' is mentioned". If he
>> does that, we only search via the index. But if he says "search all
>> headers where 'karl may' is mentioned", we search the full
>> headervalues? That would be a nice compromise.
>>
>> *Side note*:
>> Aren't most searches of the form "%searchstring%"? If yes,
>> the index is useless, as you can only search "searchstring%" to use
>> the index. And if the index can't be used, it could be dropped
>> altogether. I did some searches with Kmail, the string I searched for
>> shows what I told Kmail to search (ZMI_*, e.g. ZMI_SUBJECT searched
>> for subject)
>>
>> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
>> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
>> v.physmessage_id=p.id JOIN dbmail_headername n ON
>> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
>> AND headername ILIKE 'To' AND
>> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_ALL_RECIPIENT%' ORDER BY
>> message_idnr
>> (the same again but instead 'To' 'Cc', and then a 3rd time with
>> 'Bcc')
>>
>> SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
>> dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages
>> m ON p.id = m.physmessage_id WHERE mailbox_idnr = 3324 AND status IN
>> (0,1 ) AND k.is_header = '0' GROUP BY m.message_idnr,k.messageblk
>> HAVING ENCODE(k.messageblk::bytea,'escape') LIKE
>> '%ZMI_MessageContent%'
>>
>> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
>> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
>> v.physmessage_id=p.id JOIN dbmail_headername n ON
>> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
>> AND headername ILIKE 'Subject' AND
>> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_SUBJECT%' ORDER BY
>> message_idnr
>>
>> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
>> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
>> v.physmessage_id=p.id JOIN dbmail_headername n ON
>> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
>> AND headername ILIKE 'From' AND
>> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_FROM%' ORDER BY message_idnr
>>
>> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
>> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
>> v.physmessage_id=p.id JOIN dbmail_headername n ON
>> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
>> AND headername ILIKE 'List-Id' AND
>> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_LIST-ID%' ORDER BY
>> message_idnr
>>
>> SELECT m.message_idnr, v.headervalue, k.messageblk FROM
>> dbmail_messageblks k JOIN dbmail_physmessage p ON
>> k.physmessage_id=p.id JOIN dbmail_messages m ON m.physmessage_id=p.id
>> JOIN dbmail_headervalue v ON v.physmessage_id=p.id WHERE
>> m.mailbox_idnr=3324 AND m.status in (0,1) HAVING
>> SUBSTRING(v.headervalue,0,255) ILIKE '%ZMI_FullMessage%' OR
>> k.messageblk ILIKE '%ZMI_FullMessage%'
>>
>> You see? I found 3 things here:
>>
>> 1) The last SELECT produced an error on PostgreSQL 8.3, which I will
>> report in a different message
>>
>> 2) Every single search was of the form "%searchtext%", so no index
>> could be used. I mean, in dbmail 2.2 the index
>> »dbmail_headervalue_3« btree ("substring"(headervalue, 0, 255))
>> was never actually used and is therefore useless. And I guess 99% of
>> all searches will be performed with "where X contains Y", and that's
>> why it will have to be searched as X ILIKE %Y%. We could drop the
>> index and the speed wil be increased, as a never used index only
>> costs performance.
>>
>> Or is there any client that performs a search like Y%? I guess no. At
>> least, when I search for messages I always search "where X contains
>> Y" and not "where X begins with Y".
>>
>> 3) Searches are performed only on substring(headervalue,0,255). The
>> whole "headervalue" is never actually searched, and nobody seems to
>> have found that. This means either
>> a) people don't search in such long headervalues
>> or
>> b) people were searching but not finding messages and wondering,
>> still they didn't report because they can't believe the computer is
>> wrong.
>>
>> mfg zmi
>
>
> mfg zmi
--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev