On 7 November 2016 at 01:36, John D. Ament <[email protected]> wrote:
> On Sun, Nov 6, 2016 at 8:22 PM sebb <[email protected]> wrote:
>
>> On 6 November 2016 at 14:37, John D. Ament <[email protected]> wrote:
>> > On Sun, Nov 6, 2016 at 9:27 AM Daniel Gruno <[email protected]>
>> wrote:
>> >
>> >> On 11/06/2016 03:18 PM, sebb wrote:
>> >> > Fields such as message-id are stored as text strings, but they are
>> >> > only really intended to be used as ids. They don't contain independent
>> >> > text parts.
>> >> >
>> >> > From what I have understood so far from reading the ES docs, such
>> >> > fields should be tagged as
>> >> >
>> >> > "index": "not_analyzed"
>> >> >
>> >> > AIUI this reduces the analysis overhead and storage requirements, and
>> >> > also makes it harder to find fields with
>> >> > This probably applies to other fields in "mbox":
>> >> >
>> >> > mid
>> >> > possibly in-reply-to
>> >> > also references
>> >> >
>> >> > And of course the auto-created fields such as attachments
>> >> >
>> >> > Likewise the doc types currently missing from setup.py:
>> >> >
>> >> > notifications
>> >> > account
>> >> > mailinglists
>> >> >
>> >> > These are internal use only so are not intended for searching.
>> >> >
>> >> > Or have I got this completely wrong?
>> >> >
>> >>
>> >> message-id is set to not be analyzed, by the setup script (it's in the
>> >> mappings it sends to ES when creating the index). mid and in-reply-to
>> >> should probably also be not analyzed, although mid is really a copy of
>> >> the doc ID, IIRC. the list ID is also not analyzed by default (as
>> >> list_raw), neither is the raw from address
>> >>
>> >
>> > So I notice the query process is an arbitrary full text query, which runs
>> > against _all.
>> >
>> https://github.com/apache/incubator-ponymail/blob/master/site/api/lib/elastic.lua#L44
>>
>> Huh?
>>
>> The query starts:
>>
>> local url = config.es_url .. doc .. "/_search?q="..query
>>
>> where
>>
>> es_url = "http://localhost:9200/ponymail/";
>>
>> and
>>
>> doc = "mbox" by default.
>>
>> Where does the _all come in?
>>
>
> When you do a query string query in elastic search (reference:
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html)
> the default field unless specified is "_all".  I can't find anything in the
> pony code that changes this field.  As a result, its going to search _all
> by default.
>

Sorry, I thought you were referring to the _all doc type.

But I'm not sure what this has to do with my original e-mail about
which fields should be indexed, and which should not.

>>
>> > unless
>> > I need to dig into it a bit further to see if there's something building
>> up
>> > query a bit different.
>> >
>> > So... that means most of these mappings are moot.
>>

Reply via email to