On 7 November 2016 at 01:36, John D. Ament <[email protected]> wrote: > On Sun, Nov 6, 2016 at 8:22 PM sebb <[email protected]> wrote: > >> On 6 November 2016 at 14:37, John D. Ament <[email protected]> wrote: >> > On Sun, Nov 6, 2016 at 9:27 AM Daniel Gruno <[email protected]> >> wrote: >> > >> >> On 11/06/2016 03:18 PM, sebb wrote: >> >> > Fields such as message-id are stored as text strings, but they are >> >> > only really intended to be used as ids. They don't contain independent >> >> > text parts. >> >> > >> >> > From what I have understood so far from reading the ES docs, such >> >> > fields should be tagged as >> >> > >> >> > "index": "not_analyzed" >> >> > >> >> > AIUI this reduces the analysis overhead and storage requirements, and >> >> > also makes it harder to find fields with >> >> > This probably applies to other fields in "mbox": >> >> > >> >> > mid >> >> > possibly in-reply-to >> >> > also references >> >> > >> >> > And of course the auto-created fields such as attachments >> >> > >> >> > Likewise the doc types currently missing from setup.py: >> >> > >> >> > notifications >> >> > account >> >> > mailinglists >> >> > >> >> > These are internal use only so are not intended for searching. >> >> > >> >> > Or have I got this completely wrong? >> >> > >> >> >> >> message-id is set to not be analyzed, by the setup script (it's in the >> >> mappings it sends to ES when creating the index). mid and in-reply-to >> >> should probably also be not analyzed, although mid is really a copy of >> >> the doc ID, IIRC. the list ID is also not analyzed by default (as >> >> list_raw), neither is the raw from address >> >> >> > >> > So I notice the query process is an arbitrary full text query, which runs >> > against _all. >> > >> https://github.com/apache/incubator-ponymail/blob/master/site/api/lib/elastic.lua#L44 >> >> Huh? >> >> The query starts: >> >> local url = config.es_url .. doc .. "/_search?q="..query >> >> where >> >> es_url = "http://localhost:9200/ponymail/" >> >> and >> >> doc = "mbox" by default. >> >> Where does the _all come in? >> > > When you do a query string query in elastic search (reference: > https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html) > the default field unless specified is "_all". I can't find anything in the > pony code that changes this field. As a result, its going to search _all > by default. >
Sorry, I thought you were referring to the _all doc type. But I'm not sure what this has to do with my original e-mail about which fields should be indexed, and which should not. >> >> > unless >> > I need to dig into it a bit further to see if there's something building >> up >> > query a bit different. >> > >> > So... that means most of these mappings are moot. >>
