Did this message get lost? I don't know as I didn't get any answer. And
it could be valuable for performance & storage used.
On Montag 09 Februar 2009 Michael Monnerie wrote:
> On Freitag 06 Februar 2009 Paul J Stevens wrote:
> > Interesting idea. VARCHAR for mysql can be 0xffff (65535) bytes
> > wide. And that is wide enough for me :-) But alas. At least on
> > mysql-5.0 you can't create a full width index on such wide fields.
> > The index only covers the first 255 bytes.
>
> Hm, shit. Didn't think about index length limits.
>
> > > ERROR: length for type varchar cannot exceed 10485760
> >
> > That's insane. Can you create a full index?
>
> I believe a single index must fit into PAGESIZE, which is 4KB and can
> only be changed at compile time (which isn't recommended and of
> course nobody would do that, including me).
>
> > > Could be OK, but please let's look if the 10MB strlen of
> > > PostgreSQL are also possible in the other supported DBs. Or at
> > > least, if the max possible would be long enough for a
> > > headervalue. What's the max here anyway? 4KB? 64KB? Couldn't we
> > > just cut the headervalue at 4KB? After all, this is only used for
> > > searching, right?
> >
> > Effectively the max width is 255. But you're right. It's *only*
> > used for searching. So personally, I'd be satisfied with storing
> > just the 255 chars. But I don't mean to be a lazy bum.
>
> I did a quick search:
> select headername,headername_id,count(1) from dbmail_headervalue v
> join dbmail_headername n on (v.headername_id=n.id) where
> length(headervalue)>255 group by headername,headername_id order by
> count;
>
> Most of those headervalues > 255 are bullshit anyway (in the sense
> of: will probably never be searched), but there are also some TO:,
> CC: , BCC:, Subject: Headers. A search should possibly be able to
> find that. Is there any chance to make a difference, lets say, the
> client says "search all mails where 'karl may' is mentioned". If he
> does that, we only search via the index. But if he says "search all
> headers where 'karl may' is mentioned", we search the full
> headervalues? That would be a nice compromise.
>
> *Side note*:
> Aren't most searches of the form "%searchstring%"? If yes,
> the index is useless, as you can only search "searchstring%" to use
> the index. And if the index can't be used, it could be dropped
> altogether. I did some searches with Kmail, the string I searched for
> shows what I told Kmail to search (ZMI_*, e.g. ZMI_SUBJECT searched
> for subject)
>
> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
> v.physmessage_id=p.id JOIN dbmail_headername n ON
> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
> AND headername ILIKE 'To' AND
> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_ALL_RECIPIENT%' ORDER BY
> message_idnr
> (the same again but instead 'To' 'Cc', and then a 3rd time with
> 'Bcc')
>
> SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
> dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages
> m ON p.id = m.physmessage_id WHERE mailbox_idnr = 3324 AND status IN
> (0,1 ) AND k.is_header = '0' GROUP BY m.message_idnr,k.messageblk
> HAVING ENCODE(k.messageblk::bytea,'escape') LIKE
> '%ZMI_MessageContent%'
>
> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
> v.physmessage_id=p.id JOIN dbmail_headername n ON
> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
> AND headername ILIKE 'Subject' AND
> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_SUBJECT%' ORDER BY
> message_idnr
>
> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
> v.physmessage_id=p.id JOIN dbmail_headername n ON
> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
> AND headername ILIKE 'From' AND
> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_FROM%' ORDER BY message_idnr
>
> SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
> ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
> v.physmessage_id=p.id JOIN dbmail_headername n ON
> v.headername_id=n.id WHERE mailbox_idnr = 3324 AND status IN (0,1)
> AND headername ILIKE 'List-Id' AND
> SUBSTRING(headervalue,0,255) ILIKE '%ZMI_LIST-ID%' ORDER BY
> message_idnr
>
> SELECT m.message_idnr, v.headervalue, k.messageblk FROM
> dbmail_messageblks k JOIN dbmail_physmessage p ON
> k.physmessage_id=p.id JOIN dbmail_messages m ON m.physmessage_id=p.id
> JOIN dbmail_headervalue v ON v.physmessage_id=p.id WHERE
> m.mailbox_idnr=3324 AND m.status in (0,1) HAVING
> SUBSTRING(v.headervalue,0,255) ILIKE '%ZMI_FullMessage%' OR
> k.messageblk ILIKE '%ZMI_FullMessage%'
>
> You see? I found 3 things here:
>
> 1) The last SELECT produced an error on PostgreSQL 8.3, which I will
> report in a different message
>
> 2) Every single search was of the form "%searchtext%", so no index
> could be used. I mean, in dbmail 2.2 the index
> »dbmail_headervalue_3« btree ("substring"(headervalue, 0, 255))
> was never actually used and is therefore useless. And I guess 99% of
> all searches will be performed with "where X contains Y", and that's
> why it will have to be searched as X ILIKE %Y%. We could drop the
> index and the speed wil be increased, as a never used index only
> costs performance.
>
> Or is there any client that performs a search like Y%? I guess no. At
> least, when I search for messages I always search "where X contains
> Y" and not "where X begins with Y".
>
> 3) Searches are performed only on substring(headervalue,0,255). The
> whole "headervalue" is never actually searched, and nobody seems to
> have found that. This means either
> a) people don't search in such long headervalues
> or
> b) people were searching but not finding messages and wondering,
> still they didn't report because they can't believe the computer is
> wrong.
>
> mfg zmi
mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev