On Sun, Nov 6, 2016 at 9:27 AM Daniel Gruno <[email protected]> wrote:
> On 11/06/2016 03:18 PM, sebb wrote: > > Fields such as message-id are stored as text strings, but they are > > only really intended to be used as ids. They don't contain independent > > text parts. > > > > From what I have understood so far from reading the ES docs, such > > fields should be tagged as > > > > "index": "not_analyzed" > > > > AIUI this reduces the analysis overhead and storage requirements, and > > also makes it harder to find fields with > > This probably applies to other fields in "mbox": > > > > mid > > possibly in-reply-to > > also references > > > > And of course the auto-created fields such as attachments > > > > Likewise the doc types currently missing from setup.py: > > > > notifications > > account > > mailinglists > > > > These are internal use only so are not intended for searching. > > > > Or have I got this completely wrong? > > > > message-id is set to not be analyzed, by the setup script (it's in the > mappings it sends to ES when creating the index). mid and in-reply-to > should probably also be not analyzed, although mid is really a copy of > the doc ID, IIRC. the list ID is also not analyzed by default (as > list_raw), neither is the raw from address > So I notice the query process is an arbitrary full text query, which runs against _all. https://github.com/apache/incubator-ponymail/blob/master/site/api/lib/elastic.lua#L44 unless I need to dig into it a bit further to see if there's something building up query a bit different. So... that means most of these mappings are moot.
