Hi Jochen,

localhost:9200/_cat/indices?v reveals that graylog2_3 is the only index in 
my Elasticsearch cluster:

health status index      pri rep docs.count docs.deleted store.size 
pri.store.size 
green  open   graylog2_3   4   0     180443            0    139.3mb        
139.3mb 


localhost:9200/_template/ reveals that the graylog-internal template which I 
included in my previous message is the only template in the cluster.


I should mention that when I try to tokenize the following string in 
Elasticsearch with the index as well as the "message" field specified in the 
URL, it works as it should, since the message field uses the whitespace 
analyzer:


curl 'localhost:9200/graylog2_3/_analyze?field=message&pretty=true' -d 'This is 
a $test:[to.see.if graylog() work$.'

"tokens" : [ {
    "token" : "This",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "is",
    "start_offset" : 5,
    "end_offset" : 7,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "a",
    "start_offset" : 8,
    "end_offset" : 9,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "$test:[to.see.if",
    "start_offset" : 10,
    "end_offset" : 26,
    "type" : "word",
    "position" : 4
  }, {
    "token" : "graylog()",
    "start_offset" : 27,
    "end_offset" : 36,
    "type" : "word",
    "position" : 5
  }, {
    "token" : "work$.",
    "start_offset" : 37,
    "end_offset" : 43,
    "type" : "word",
    "position" : 6
  } ]
}


This tells me that ES is using the whitespace analyzer correctly.  However, the 
Graylog API browser is giving me a different result:


http://localhost:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true
 
<http://vtor-lx-tomcat-d01:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true>

{
  "tokens" : [ "this", "is", "a", "test", "to.see.if", "graylog", "work" ]
}


Is this the result that I should be seeing?  Is there anything else that I 
can test in order to help me troubleshoot this further?  Thanks.

Sincerely,

On Monday, May 9, 2016 at 8:49:41 AM UTC-4, Jochen Schalanda wrote:
>
> Hi Dilip,
>
> are there any other conflicting index templates/mappings in your 
> Elasticsearch cluster?
>
> Other than that, the index mapping for graylog2_3 is looking fine and ES 
> should use the whitespace analyzer for messages indexed into this index.
>
> Cheers,
> Jochen
>
> On Friday, 6 May 2016 22:01:42 UTC+2, Dilip Muthukrishnan wrote:
>>
>> Hi Jochen,
>>
>> I'm still stuck on this one.  Any help would be appreciated.  Thanks.
>>
>> Sincerely,
>>
>> Dilip M.
>>
>> On Tuesday, May 3, 2016 at 9:32:37 AM UTC-4, Dilip Muthukrishnan wrote:
>>>
>>> Hi Jochen,
>>>
>>> Here's what my "graylog-internal" template currently looks like (as seen 
>>> via the Elasticsearch API):
>>>
>>> {
>>>   "graylog-internal" : {
>>>     "order" : 0,
>>>     "template" : "graylog2_*",
>>>     "settings" : { },
>>>     "mappings" : {
>>>       "message" : {
>>>         "_source" : {
>>>           "compress" : true,
>>>           "enabled" : true
>>>         },
>>>         "dynamic_templates" : [ {
>>>           "internal_fields" : {
>>>             "mapping" : {
>>>               "index" : "not_analyzed",
>>>               "doc_values" : true
>>>             },
>>>             "match" : "gl2_*"
>>>           }
>>>         }, {
>>>           "store_generic" : {
>>>             "mapping" : {
>>>               "index" : "not_analyzed"
>>>             },
>>>             "match" : "*"
>>>           }
>>>         } ],
>>>         "_ttl" : {
>>>           "enabled" : true
>>>         },
>>>         "properties" : {
>>>           "message" : {
>>>             "index" : "analyzed",
>>>             "analyzer" : "whitespace",
>>>             "type" : "string"
>>>           },
>>>           "timestamp" : {
>>>             "format" : "yyyy-MM-dd HH:mm:ss.SSS",
>>>             "doc_values" : true,
>>>             "type" : "date"
>>>           },
>>>           "source" : {
>>>             "index" : "analyzed",
>>>             "analyzer" : "analyzer_keyword",
>>>             "type" : "string"
>>>           },
>>>           "full_message" : {
>>>             "index" : "analyzed",
>>>             "analyzer" : "whitespace",
>>>             "type" : "string"
>>>           }
>>>         }
>>>       }
>>>     },
>>>     "aliases" : { }
>>>   }
>>> }
>>>
>>>
>>> Here's what my graylog2_3 index currently looks like (as seen via the 
>>> Elasticsearch API):
>>>
>>> {
>>>   "graylog2_3" : {
>>>     "aliases" : {
>>>       "graylog2_deflector" : { }
>>>     },
>>>     "mappings" : {
>>>       "message" : {
>>>         "dynamic_templates" : [ {
>>>           "internal_fields" : {
>>>             "mapping" : {
>>>               "index" : "not_analyzed",
>>>               "doc_values" : true
>>>             },
>>>             "match" : "gl2_*"
>>>           }
>>>         }, {
>>>           "store_generic" : {
>>>             "mapping" : {
>>>               "index" : "not_analyzed"
>>>             },
>>>             "match" : "*"
>>>           }
>>>         } ],
>>>         "_ttl" : {
>>>           "enabled" : true
>>>         },
>>>         "_source" : {
>>>           "compress" : true
>>>         },
>>>         "properties" : {
>>>           "full_message" : {
>>>             "type" : "string",
>>>             "analyzer" : "whitespace"
>>>           },
>>>           "gl2_remote_ip" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed",
>>>             "doc_values" : true
>>>           },
>>>           "gl2_remote_port" : {
>>>             "type" : "long",
>>>             "doc_values" : true
>>>           },
>>>           "gl2_source_collector" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed",
>>>             "doc_values" : true
>>>           },
>>>           "gl2_source_collector_input" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed",
>>>             "doc_values" : true
>>>           },
>>>           "gl2_source_input" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed",
>>>             "doc_values" : true
>>>           },
>>>           "gl2_source_node" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed",
>>>             "doc_values" : true
>>>           },
>>>           "level" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed"
>>>           },
>>>           "message" : {
>>>             "type" : "string",
>>>             "analyzer" : "whitespace"
>>>           },
>>>           "source" : {
>>>             "type" : "string",
>>>             "analyzer" : "analyzer_keyword"
>>>           },
>>>           "source_file" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed"
>>>           },
>>>           "timestamp" : {
>>>             "type" : "date",
>>>             "doc_values" : true,
>>>             "format" : "yyyy-MM-dd HH:mm:ss.SSS"
>>>           },
>>>           "version" : {
>>>             "type" : "string",
>>>             "index" : "not_analyzed"
>>>           }
>>>         }
>>>       }
>>>     },
>>>     "settings" : {
>>>       "index" : {
>>>         "creation_date" : "1462197971182",
>>>         "uuid" : "ylBuS8y3SBKRYMyLuMWApg",
>>>         "analysis" : {
>>>           "analyzer" : {
>>>             "analyzer_keyword" : {
>>>               "filter" : "lowercase",
>>>               "tokenizer" : "keyword"
>>>             }
>>>           }
>>>         },
>>>         "number_of_replicas" : "0",
>>>         "number_of_shards" : "4",
>>>         "version" : {
>>>           "created" : "1070399"
>>>         }
>>>       }
>>>     },
>>>     "warmers" : { }
>>>   }
>>> }
>>>
>>>
>>> After cycling the deflector so that it points to the new index, 
>>> graylog2_3, I proceeded to delete my old indices.
>>>
>>> Using the Graylog API browser, I tried to tokenize a random string (This 
>>> is a $test:[to.see.if graylog() work$.):
>>>
>>>
>>> http://vtor-lx-tomcat-d01:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true
>>>
>>> {
>>>   "tokens" : [ "this", "is", "a", "test", "to.see.if", "graylog", "work" ]
>>> }
>>>
>>>
>>> This makes sense because if I attempt to tokenize the same string via 
>>> Elasticsearch (using the same index), I get the same result:
>>>
>>> curl 'vtor-lx-tomcat-d01:9200/graylog2_3/_analyze?pretty=true' -d 'This 
>>> is a $test:[to.see.if graylog() work$.'
>>>
>>> "tokens" : [ {
>>>     "token" : "this",
>>>     "start_offset" : 0,
>>>     "end_offset" : 4,
>>>     "type" : "<ALPHANUM>",
>>>     "position" : 1
>>>   }, {
>>>     "token" : "is",
>>>     "start_offset" : 5,
>>>     "end_offset" : 7,
>>>     "type" : "<ALPHANUM>",
>>>     "position" : 2
>>>   }, {
>>>     "token" : "a",
>>>     "start_offset" : 8,
>>>     "end_offset" : 9,
>>>     "type" : "<ALPHANUM>",
>>>     "position" : 3
>>>   }, {
>>>     "token" : "test",
>>>     "start_offset" : 11,
>>>     "end_offset" : 15,
>>>     "type" : "<ALPHANUM>",
>>>     "position" : 4
>>>   }, {
>>>     "token" : "to.see.if",
>>>     "start_offset" : 17,
>>>     "end_offset" : 26,
>>>     "type" : "<ALPHANUM>",
>>>     "position" : 5
>>>   }, {
>>>     "token" : "graylog",
>>>     "start_offset" : 27,
>>>     "end_offset" : 34,
>>>     "type" : "<ALPHANUM>",
>>>     "position" : 6
>>>   }, {
>>>     "token" : "work",
>>>     "start_offset" : 37,
>>>     "end_offset" : 41,
>>>     "type" : "<ALPHANUM>",
>>>     "position" : 7
>>>   } ]
>>> }
>>>
>>> However, without specifying the index in Elasticsearch, I get the result 
>>> that I am looking for:
>>>
>>> curl 'vtor-lx-tomcat-d01:9200/_analyze?analyzer=whitespace&pretty=true' 
>>> -d 'This is a $test:[to.see.if graylog() work$.'
>>>
>>> "tokens" : [ {
>>>     "token" : "This",
>>>     "start_offset" : 0,
>>>     "end_offset" : 4,
>>>     "type" : "word",
>>>     "position" : 1
>>>   }, {
>>>     "token" : "is",
>>>     "start_offset" : 5,
>>>     "end_offset" : 7,
>>>     "type" : "word",
>>>     "position" : 2
>>>   }, {
>>>     "token" : "a",
>>>     "start_offset" : 8,
>>>     "end_offset" : 9,
>>>     "type" : "word",
>>>     "position" : 3
>>>   }, {
>>>     "token" : "$test:[to.see.if",
>>>     "start_offset" : 10,
>>>     "end_offset" : 26,
>>>     "type" : "word",
>>>     "position" : 4
>>>   }, {
>>>     "token" : "graylog()",
>>>     "start_offset" : 27,
>>>     "end_offset" : 36,
>>>     "type" : "word",
>>>     "position" : 5
>>>   }, {
>>>     "token" : "work$.",
>>>     "start_offset" : 37,
>>>     "end_offset" : 43,
>>>     "type" : "word",
>>>     "position" : 6
>>>   } ]
>>> }
>>>
>>> I feel like I am really close to an answer here.  It appears that there 
>>> is something wrong with my index mapping/settings.
>>>
>>> Sincerely,
>>>
>>> On Tuesday, May 3, 2016 at 3:51:49 AM UTC-4, Jochen Schalanda wrote:
>>>>
>>>> Hi Dilip,
>>>>
>>>> are you 100% sure that the message is in a new index, that the index 
>>>> template/mapping was properly applied (see 
>>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-get-mapping.html),
>>>>  
>>>> and that it is the "message" field you were looking for (and not 
>>>> "full_message" or another field)?
>>>>
>>>> Cheers,
>>>> Jochen
>>>>
>>>> On Monday, 2 May 2016 18:57:40 UTC+2, Dilip Muthukrishnan wrote:
>>>>>
>>>>> Hi Jochen,
>>>>>
>>>>> Thanks for your reply.  I'm using graylog-1.3.4 (server).  I removed 
>>>>> and added an updated version of the "graylog-internal" template and then 
>>>>> cycled the deflector through the web interface.  The new index mapping 
>>>>> reflects the changes:
>>>>>
>>>>> "message" : {
>>>>>    "type" : "string",
>>>>>    "analyzer" : "whitespace"
>>>>> }
>>>>>
>>>>>
>>>>> However, it doesn't appear to be reflected in the search.  This 
>>>>> message is from the latest index but based on this tokenization, it 
>>>>> appears 
>>>>> to still be using the old "standard analyzer":
>>>>>
>>>>> 02.05.2016 12:47:33.488 *ERROR* [Shell Script Executor Thread for 
>>>>> cpu.sh] com.day.crx.core.CRXSessionImpl session# 144563 opened (103) 
>>>>> java.lang.Exception: Stack Trace at 
>>>>> com.day.crx.core.CRXSessionImpl$Tracker.open(CRXSessionImpl.java:212) at 
>>>>> com.day.crx.core.CRXSessionImpl$Tracker.<init>(CRXSessionImpl.java:205) 
>>>>> at 
>>>>> com.day.crx.core.CRXSessionImpl.<init>(CRXSessionImpl.java:179) at 
>>>>> com.day.crx.core.CRXRepositoryImpl.createSessionInstance(CRXRepositoryImpl.java:911)
>>>>>  
>>>>> at 
>>>>> org.apache.jackrabbit.core.RepositoryImpl.createSession(RepositoryImpl.java:959)
>>>>>  
>>>>> at 
>>>>> org.apache.jackrabbit.core.SessionFactory.createAdminSession(SessionFactory.java:42)
>>>>>  
>>>>> at 
>>>>> com.day.crx.sling.server.impl.SlingRepositoryWrapper.loginAdministrative(SlingRepositoryWrapper.java:76)
>>>>>  
>>>>> at 
>>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.extractScript(ShellScriptExecutorImpl.java:161)
>>>>>  
>>>>> at 
>>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.execute(ShellScriptExecutorImpl.java:114)
>>>>>  
>>>>> at 
>>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:99) 
>>>>> at 
>>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:158)
>>>>>  
>>>>> at 
>>>>> com.adobe.granite.monitoring.impl.ScriptConfigImpl$ExecutionThread.run(ScriptConfigImpl.java:208)
>>>>>  
>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>>
>>>>>
>>>>> Field terms: 02.05.2016124733.488errorshellscriptexecutorthreadfor
>>>>> cpu.shcom.day.crx.core.crxsessionimplsession144563opened103
>>>>> java.lang.exceptionstacktraceattracker.opencrxsessionimpl.java212
>>>>> trackerinit205179
>>>>> com.day.crx.core.crxrepositoryimpl.createsessioninstance
>>>>> crxrepositoryimpl.java911
>>>>> org.apache.jackrabbit.core.repositoryimpl.createsession
>>>>> repositoryimpl.java959
>>>>> org.apache.jackrabbit.core.sessionfactory.createadminsession
>>>>> sessionfactory.java42
>>>>> com.day.crx.sling.server.impl.slingrepositorywrapper.loginadministrative
>>>>> slingrepositorywrapper.java76
>>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.extractscript
>>>>> shellscriptexecutorimpl.java161
>>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.execute114
>>>>> com.adobe.granite.monitoring.impl.scriptmbean.invokescriptmbean.java99
>>>>> 158com.adobe.granite.monitoring.impl.scriptconfigimpl
>>>>> executionthread.runscriptconfigimpl.java208java.lang.thread.run
>>>>> thread.java662
>>>>>
>>>>> As you can see, it has been stripped of various characters like colons 
>>>>> and parentheses.
>>>>>
>>>>>
>>>>> On Monday, May 2, 2016 at 12:36:38 PM UTC-4, Jochen Schalanda wrote:
>>>>>>
>>>>>> Hi Dilip,
>>>>>>
>>>>>> the index mapping of Graylog is applied by the means of an index 
>>>>>> template. In Graylog 2.0.0, the index template will automatically be 
>>>>>> updated but in older versions you'll have to remove the index template 
>>>>>> yourself for it to be recreated by Graylog.
>>>>>>
>>>>>> See 
>>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-templates.html
>>>>>>  
>>>>>> for details.
>>>>>>
>>>>>> Cheers,
>>>>>> Jochen
>>>>>>
>>>>>> On Thursday, 28 April 2016 21:42:23 UTC+2, Dilip Muthukrishnan wrote:
>>>>>>>
>>>>>>> I'm trying to change the analyzer from "standard" to "whitespace". 
>>>>>>>  I've set the following property in my Graylog server configuration:
>>>>>>>
>>>>>>> elasticsearch_analyzer = whitespace
>>>>>>>
>>>>>>> It states that my change will be applied to new indices so I 
>>>>>>> manually cycled the deflector so that it is now pointing to graylog2_1 
>>>>>>> (previously graylog2_0).  However, the new index still uses the 
>>>>>>> "standard" 
>>>>>>> analyzer based on the mapping in Elasticsearch:
>>>>>>>
>>>>>>> "message" : {
>>>>>>>             "type" : "string",
>>>>>>>             "analyzer" : "standard"
>>>>>>>           },
>>>>>>>
>>>>>>>
>>>>>>> How do I change the analyzer?
>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to graylog2+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/0a537a06-dad2-4a93-826f-bf1e5b60f68f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to