Re: Elasticsearch shuts down for no reason

2014-08-24 Thread Eitan Vesely

Thanks Mark,

auth.log doesnt show any login or sudo at the time of the elastic 
stopping...
nothing else is running on that machine - it is a dedicated ES server.

what i did find in the auth log is that someone is trying to hack into the 
system, yet i dont see how it got to do with elastic stopping?

On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote:

 Something is stopping the service.

 If you are on linux check the auth log, if anyone is using sudo to stop it 
 then you will see that logged. Otherwise, what else runs on the machine?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com javascript: 
 wrote:

 Hi Guys,
 i've installed ES a month ago and its working just fine.

 today, for some reason, ES just went down for no visible reason:

 here is what i see in the log file : 

 [2014-08-23 16:47:11,272][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] 
 lastShard [true]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][1]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at 
 org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at 
 org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:633)
  ... 9 more
 [2014-08-23 16:47:11,273][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][0]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at 
 org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at 
 org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:633)
  ... 9 more
 

Re: Elasticsearch shuts down for no reason

2014-08-24 Thread Eitan Vesely

I did find the shutdown request in the syslog:

Aug 23 16:49:01 medisafelog2 kernel: [3361057.489168] hv_utils: Shutdown 
request received - graceful shutdown initiated

yet i have no idea who or what initiated it... how can i dig in?

On Sunday, August 24, 2014 12:01:50 PM UTC+3, Eitan Vesely wrote:


 Thanks Mark,

 auth.log doesnt show any login or sudo at the time of the elastic 
 stopping...
 nothing else is running on that machine - it is a dedicated ES server.

 what i did find in the auth log is that someone is trying to hack into the 
 system, yet i dont see how it got to do with elastic stopping?

 On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote:

 Something is stopping the service.

 If you are on linux check the auth log, if anyone is using sudo to stop 
 it then you will see that logged. Otherwise, what else runs on the machine?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com wrote:

 Hi Guys,
 i've installed ES a month ago and its working just fine.

 today, for some reason, ES just went down for no visible reason:

 here is what i see in the log file : 

 [2014-08-23 16:47:11,272][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] 
 lastShard [true]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][1]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at 
 org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at 
 org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:633)
  ... 9 more
 [2014-08-23 16:47:11,273][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][0]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at 
 

Re: Elasticsearch shuts down for no reason

2014-08-24 Thread joergpra...@gmail.com
The company which is providing the hosting service had shut down the
virtual machine, hv_utils is a message from the hypervisor.

This is not related to Elasticsearch at all.

Jörg


On Sun, Aug 24, 2014 at 11:19 AM, Mark Walkom ma...@campaignmonitor.com
wrote:

 What version of ES are you running?
 Are you running on a hosting service and if so do you have a firewall
 protecting the host - ie it's not open to the entire internet?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 24 August 2014 19:05, Eitan Vesely eitan...@gmail.com wrote:


 I did find the shutdown request in the syslog:

 Aug 23 16:49:01 medisafelog2 kernel: [3361057.489168] hv_utils: Shutdown
 request received - graceful shutdown initiated

 yet i have no idea who or what initiated it... how can i dig in?


 On Sunday, August 24, 2014 12:01:50 PM UTC+3, Eitan Vesely wrote:


 Thanks Mark,

 auth.log doesnt show any login or sudo at the time of the elastic
 stopping...
 nothing else is running on that machine - it is a dedicated ES server.

 what i did find in the auth log is that someone is trying to hack into
 the system, yet i dont see how it got to do with elastic stopping?

 On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote:

 Something is stopping the service.

 If you are on linux check the auth log, if anyone is using sudo to stop
 it then you will see that logged. Otherwise, what else runs on the machine?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com wrote:

 Hi Guys,
 i've installed ES a month ago and its working just fine.

 today, for some reason, ES just went down for no visible reason:

 here is what i see in the log file :

 [2014-08-23 16:47:11,272][DEBUG][action.search.type   ]
 [Plunderer] [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P],
 s[STARTED]: Failed to execute [org.elasticsearch.action.
 search.SearchRequest@5531dfad] lastShard [true]
 org.elasticsearch.search.SearchParseException:
 [g30nm0bi2j663tgu6ud][1]: from[-1],size[-1]: Parse Failure [Failed to 
 parse
 source [{facets:{0:{date_histogram:{key_field:@
 timestamp,value_field:user_count,interval:1h},
 global:true,facet_filter:{fquery:{query:{filtered:{
 query:{query_string:{query:*}},filter:{bool:
 {must:[{range:{@timestamp:{from:1407602785182,to:
 1408812385182}}},{range:{@timestamp:{from:1408516424602,to:
 1408811520255}}}],size:0}]]
  at org.elasticsearch.search.SearchService.parseSource(
 SearchService.java:649)
 at org.elasticsearch.search.SearchService.createContext(
 SearchService.java:511)
  at org.elasticsearch.search.SearchService.createAndPutContext(
 SearchService.java:483)
 at org.elasticsearch.search.SearchService.executeQueryPhase(
 SearchService.java:252)
  at org.elasticsearch.search.action.SearchServiceTransportAction$
 5.call(SearchServiceTransportAction.java:206)
 at org.elasticsearch.search.action.SearchServiceTransportAction$
 5.call(SearchServiceTransportAction.java:203)
  at org.elasticsearch.search.action.SearchServiceTransportAction$
 23.run(SearchServiceTransportAction.java:517)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException:
 Facet [0]: (key) field [@timestamp] not found
 at org.elasticsearch.search.facet.datehistogram.
 DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at org.elasticsearch.search.facet.FacetParseElement.parse(
 FacetParseElement.java:93)
 at org.elasticsearch.search.SearchService.parseSource(
 SearchService.java:633)
  ... 9 more
 [2014-08-23 16:47:11,273][DEBUG][action.search.type   ]
 [Plunderer] [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P],
 s[STARTED]: Failed to execute [org.elasticsearch.action.
 search.SearchRequest@5531dfad]
 org.elasticsearch.search.SearchParseException:
 [g30nm0bi2j663tgu6ud][0]: from[-1],size[-1]: Parse Failure [Failed to 
 parse
 source [{facets:{0:{date_histogram:{key_field:@
 timestamp,value_field:user_count,interval:1h},
 global:true,facet_filter:{fquery:{query:{filtered:{
 query:{query_string:{query:*}},filter:{bool:
 {must:[{range:{@timestamp:{from:1407602785182,to:
 1408812385182}}},{range:{@timestamp:{from:1408516424602,to:
 1408811520255}}}],size:0}]]
  at org.elasticsearch.search.SearchService.parseSource(
 SearchService.java:649)
 at org.elasticsearch.search.SearchService.createContext(
 SearchService.java:511)
  at org.elasticsearch.search.SearchService.createAndPutContext(
 SearchService.java:483)
 at org.elasticsearch.search.SearchService.executeQueryPhase(
 SearchService.java:252)
  at 

date_histogram facet float possible overflow

2014-08-24 Thread moshe zada
 

HI all,

I am using ELK stack to visualising our monitoring data, yesterday i came 
across a weird problem: ElasticSearch date_histogram facet returned 
floating results that look like an overflow (min : 4.604480259023595*E*
18).
Our dataflow is : collectd (cpu/memory) - sends it to riemann - logstash 
- elasticsearch 

At first the values were correct, after a few days the values became huge 
(see attached snapshot of kibana graph)

*filtered query + Result:*

*query:*
url -XGET 'http://localhost:9200/logstash-2014.08.24/_search?pretty' -d '{
  query: {
filtered: {
  query: {
bool: {
  should: [
{
  query_string: {
query: 
subservice.raw:\processes-cpu_percent/gauge-collectd\ AND 
(plugin_instance:\cpu_percent\)
  }
}
  ]
}
  },
  filter: {
bool: {
  must: [
{
  range: {
@timestamp: {
  from: 1408884312966,
  to: 1408884612966
}
  }
},
{
  range: {
@timestamp: {
  from: 1408884311948,
  to: 1408884327941
}
  }
},
{
  fquery: {
query: {
  query_string: {
query: 
subservice:(\processes-cpu_percent/gauge-collectd\)
  }
},
_cache: false
  }
}
  ]
}
  }
}
  },
  size: 500,
  sort: [
{
  metric: {
order: desc,
ignore_unmapped: false
  }
},
{
  @timestamp: {
order: desc,
ignore_unmapped: false
  }
}
  ]
}'




*result:*
{
  took : 47,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 2,
max_score : null,
hits : [ {
  _index : logstash-2014.08.24,
  _type : gauge,
  _id : SlzG8bGJQziU0LMoN7nrbQ,
  _score : null,
  _source:{host:host1,service:
instance-2014-08-24T1106/processes-cpu_percent/gauge-collectd,state:null
,description:null,metric:0.7,tags:[collectd],time:
2014-08-24T12:45:25.000Z,ttl:20.0,type:gauge,source:host1,
ds_type:gauge,plugin_instance:cpu_percent,ds_name:value,
type_instance:collectd,plugin:processes,ds_index:0,@version:
1,@timestamp:2014-08-24T12:45:15.079Z},
  sort : [ 4604480259023595110, 1408884325088 ]

}, {

  _index : logstash-2014.08.24,
  _type : gauge,
  _id : 8hxToMjpQ5WQIw15DQqIGA,
  _score : null,
  _source:{host:host1,service:
instance-2014-08-24T1106/processes-cpu_percent/gauge-collectd,state:null
,description:null,metric:0.5,tags:[collectd],time:
2014-08-24T12:45:15.000Z,ttl:20.0,type:gauge,source:host1,
ds_type:gauge,plugin_instance:cpu_percent,ds_name:value,
type_instance:collectd,plugin:processes,ds_index:0,@version:
1,@timestamp:2014-08-24T12:45:15.079Z},
  sort : [ 4602678819172646912, 1408884315079 ]
} ]
  }
}




*date histogram Facet + Results:query:*
curl -XGET 'http://localhost:9200/logstash-2014.08.24/_search?pretty' -d '{
  facets: {
0: {
  date_histogram: {
key_field: @timestamp,
value_field: metric,
interval: 1s
  },
  global: true,
  facet_filter: {
fquery: {
  query: {
filtered: {
  query: {
query_string: {
  query: 
subservice.raw:\processes-cpu_percent/gauge-collectd\ AND 
(plugin_instance:cpu_percent) AND *
}
  },
  filter: {
bool: {
  must: [
{
  range: {
@timestamp: {
  from: 1408884199622,
  to: 1408884499623
}
  }
},
{
  range: {
@timestamp: {
  from: 1408884311948,
  to: 1408884327941
}
  }
},
{
  fquery: {
query: {
  query_string: {
query: 
subservice:(\processes-cpu_percent/gauge-collectd\)
  }
},
_cache: true
  }
}
  ]
}
  }
}
  }
}
  }
}
  },
  size: 0
}' 



*result:*
{
  took : 24,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 1197141,
max_score : 0.0,
hits : [ ]
  },
  facets : {
0 : {
  _type : date_histogram,
  entries : [ {
  

Re: Json Data not getting parsed when sent to Elasticsearch

2014-08-24 Thread moshe zada
what is your logstash configuration?
did you tried the json codec http://logstash.net/docs/1.4.2/codecs/json?

On Sunday, August 24, 2014 4:54:08 PM UTC+3, Didjit wrote:

 Hi,

 The following is a debug from Logstash:

 {
 message = 
 {\EventTime\:\2014-08-24T09:44:46-0400\,\URI\:\
 http://ME/rest/venue/ME/hours/2014-08-24\
 ,\uri_payload\:{\value\:[{\open\:\2014-08-24T13:00:00.000+\,\close\:\2014-08-24T23:00:00.000+\,\isOpen\:true,\date\:\2014-08-24\}],\Count\:1}}\r,
@version = 1,
  @timestamp = 2014-08-24T13:44:48.036Z,
host = 127.0.0.1:60778,
type = MY_Detail,
   EventTime = 2014-08-24T09:44:46-0400,
 URI = http://ME/rest/venue/ME//hours/2014-08-24;,
 uri_payload = {
 value = [
 [0] {
   open = 2014-08-24T13:00:00.000+,
  close = 2014-08-24T23:00:00.000+,
 isOpen = true,
   date = 2014-08-24
 }
 ],
 Count = 1,
 0 = {}
 },
  MYId = ME
 }
 ___

 When i look into Elasticsearch, the fields under URI Payload are not 
 parsed. It shows:

 uri_payload.value as the field with 
 {open:2014-08-21T13:00:00.000+,close:2014-08-21T23:00:00.000+,isOpen:true,date:2014-08-21}

 How can I get all the parsed values as fields in elasticsearch? In my 
 example, fields Open, Close, IsOpen. Initially I thought Logstash was not 
 parsing all the json, but looking at the debug it is.

 Thank you,

 Chris





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fe60df4d-cd36-43c9-a08c-7213abc2dd18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

2014-08-24 Thread Jonathan Foy
I ran into the same issue when using Integer.MAX_VALUE as the size 
parameter (migrating from a DB-based search).  Perhaps someone can come up 
with a proper reference, I cannot, but according to a comment in this SO 
http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records
 
question, Elasticsearch/Lucene tries to allocate memory for that many 
scores.  When I switched those queries to a count/search duo, things 
improved dramatically, as you've already noticed.

On Saturday, August 23, 2014 12:17:47 PM UTC-4, Narendra Yadala wrote:


 I am not returning 2 billion documents :)  

 I am returning all documents that match. Actual number can be anywhere 
 between 0 to 50k. I am just fetching documents between a given time 
 interval such as one hour, one day so on and then do batch processing them.

 I fixed this by making 2 queries, one to fetch count and other for actual 
 data. It is mentioned in some other thread that scroll api is performance 
 intensive so I did not go for it.

 On Saturday, 23 August 2014 21:32:59 UTC+5:30, Ivan Brusic wrote:

 When I kept size as Integer.MAX_VALUE, it caused all the problems

 Are you trying to return up to 2 billion documents at once? Even if that 
 number was only 1 million, you will face problems. Or did I perhaps 
 misunderstand you?

 Are you sorting the documents based on the score (the default)? 
 Lucene/Elasticsearch would need to keep all the values in memory in order 
 to start them, causing memory problems. In general, Lucene is not effective 
 at deep pagination. Use scan/scroll:


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

 -- 
 Ivan


 On Sat, Aug 23, 2014 at 6:46 AM, Narendra Yadala narendr...@gmail.com 
 wrote:

 Hi Jörg,

 This query 
 {
query : {
   bool: {
   must: {
match : { body : big }
},
   must_not: {
match : { body : data }   
},
must: {
 match : {id: 521}
}
  }
}
 }

 and this query are performing exactly same
 {
query : {
   bool: {
   must: {
match : { body : big }
},
   must_not: {
match : { body : data }   
}
  }
},
filter : {
term : { id : 521 }
}
 }

 I am not able understand what makes a filtered query fast. Is there any 
 place where I can find documentation on the internals of how different 
 queries are processed by elasticsearch.

 On Saturday, 23 August 2014 18:20:23 UTC+5:30, Jörg Prante wrote:

 Before firing queries, you should consider if the index design and 
 query choice is optimal.

 Numeric range queries are not straightforward. They were a major issue 
 on inverted index engines like Lucene/Elasticsearch and it has taken some 
 time to introduce efficient implementations. See e.g. 
 https://issues.apache.org/jira/browse/LUCENE-1673

 ES tries to compensate the downsides of massive numeric range queries 
 by loading all the field values into memory. To achieve effective queries, 
 you have to carefully discretize the values you index. 

 For example, a few hundred millions of different timestamps, with 
 millisecond resolution, are a real burden for searching on inverted 
 indices. A good discretization strategy for indexing is to reduce the 
 total 
 amount of values in such field to a few hundred or thousands. For 
 timestamps, this means, indexing time-based series data in discrete 
 intervals of days, hours, minutes, maybe seconds is much more efficient 
 than e.g. millisecond resolution.

 Another topic is to use filters for boolean queries. They are much 
 faster.

 Jörg



 On Sat, Aug 23, 2014 at 2:19 PM, Narendra Yadala narendr...@gmail.com 
 wrote:

 Hi Ivan,

 Thanks for the input about aggregating on strings, I do that, but 
 those queries take time but they do not crash node. 

 The queries which caused problem were pretty straightforward queries 
 (such as a boolean query with two musts, one must is equal match and 
 other 
 a range match on long) but the real problem was with the size. When I 
 kept 
 size as Integer.MAX_VALUE, it caused all the problems. When I removed it, 
 it started working fine. I think it is worth mentioning somewhere about 
 this strange behavior (probably expected but strange).

 I did double up on the RAM though and now I have allocated 5*10G RAM 
 to the cluster. Things are looking ok as of now, except that the 
 aggregations (on strings) are quite slow. May be I would run these 
 aggregations as batch and cache the outputs in a different type and move 
 on 
 for now.

 Thanks
 NY


 On Fri, Aug 22, 2014 at 10:34 PM, Ivan Brusic iv...@brusic.com 
 wrote:

 How expensive are your queries? Are you using aggregations or sorting 
 on string fields that could use up your field data cache? Are you using 
 the 
 defaults for the cache? Post the current usage.

 If you 

Re: Json Data not getting parsed when sent to Elasticsearch

2014-08-24 Thread Didjit
Pretty simple (below). . I just added to json codec and tried again and 
received the same results. Thank you!

elasticsearch { 
host = localhost 
cluster = cjceswin
node_name = cjcnode
codec = json
 index = logstash-dwhse-%{+.MM.dd}
 workers = 3
}

}

On Sunday, August 24, 2014 10:11:44 AM UTC-4, moshe zada wrote:

 what is your logstash configuration?
 did you tried the json codec http://logstash.net/docs/1.4.2/codecs/json?

 On Sunday, August 24, 2014 4:54:08 PM UTC+3, Didjit wrote:

 Hi,

 The following is a debug from Logstash:

 {
 message = 
 {\EventTime\:\2014-08-24T09:44:46-0400\,\URI\:\
 http://ME/rest/venue/ME/hours/2014-08-24\
 ,\uri_payload\:{\value\:[{\open\:\2014-08-24T13:00:00.000+\,\close\:\2014-08-24T23:00:00.000+\,\isOpen\:true,\date\:\2014-08-24\}],\Count\:1}}\r,
@version = 1,
  @timestamp = 2014-08-24T13:44:48.036Z,
host = 127.0.0.1:60778,
type = MY_Detail,
   EventTime = 2014-08-24T09:44:46-0400,
 URI = http://ME/rest/venue/ME//hours/2014-08-24;,
 uri_payload = {
 value = [
 [0] {
   open = 2014-08-24T13:00:00.000+,
  close = 2014-08-24T23:00:00.000+,
 isOpen = true,
   date = 2014-08-24
 }
 ],
 Count = 1,
 0 = {}
 },
  MYId = ME
 }
 ___

 When i look into Elasticsearch, the fields under URI Payload are not 
 parsed. It shows:

 uri_payload.value as the field with 
 {open:2014-08-21T13:00:00.000+,close:2014-08-21T23:00:00.000+,isOpen:true,date:2014-08-21}

 How can I get all the parsed values as fields in elasticsearch? In my 
 example, fields Open, Close, IsOpen. Initially I thought Logstash was not 
 parsing all the json, but looking at the debug it is.

 Thank you,

 Chris





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0afd4105-a521-487a-8889-4bcabee419b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

2014-08-24 Thread joergpra...@gmail.com
Exactly. Filters do not use scores. They also use bit sets which makes them
reusable and fast.

I wasn't talking about a filter added to a query, I mean filtered queries.
This is a huge difference.

This query

{
   query : {
  bool: {
  must: {
   match : { body : big }
   },
  must_not: {
   match : { body : data }
   },
   must: {
match : {id: 521}
   }
 }
   }
}

can be turned into this filtered query

{
 query : {
   constant_score: {
 filter: {
   bool: {
 must: [
  { match : { body : big } },
  {match : {id: 521} }
 ],
 must_not: {
  match : { body : data }
 }
   }
 }
   }
  }
}

(plus fixing the double key must which is a potential source of errors)

Jörg



On Sun, Aug 24, 2014 at 4:30 PM, Jonathan Foy the...@gmail.com wrote:

 I ran into the same issue when using Integer.MAX_VALUE as the size
 parameter (migrating from a DB-based search).  Perhaps someone can come up
 with a proper reference, I cannot, but according to a comment in this SO
 http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records
 question, Elasticsearch/Lucene tries to allocate memory for that many
 scores.  When I switched those queries to a count/search duo, things
 improved dramatically, as you've already noticed.


 On Saturday, August 23, 2014 12:17:47 PM UTC-4, Narendra Yadala wrote:


 I am not returning 2 billion documents :)

 I am returning all documents that match. Actual number can be anywhere
 between 0 to 50k. I am just fetching documents between a given time
 interval such as one hour, one day so on and then do batch processing them.

 I fixed this by making 2 queries, one to fetch count and other for actual
 data. It is mentioned in some other thread that scroll api is performance
 intensive so I did not go for it.

 On Saturday, 23 August 2014 21:32:59 UTC+5:30, Ivan Brusic wrote:

 When I kept size as Integer.MAX_VALUE, it caused all the problems

 Are you trying to return up to 2 billion documents at once? Even if that
 number was only 1 million, you will face problems. Or did I perhaps
 misunderstand you?

 Are you sorting the documents based on the score (the default)?
 Lucene/Elasticsearch would need to keep all the values in memory in order
 to start them, causing memory problems. In general, Lucene is not effective
 at deep pagination. Use scan/scroll:

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/search-request-scroll.html

 --
 Ivan


 On Sat, Aug 23, 2014 at 6:46 AM, Narendra Yadala narendr...@gmail.com
 wrote:

 Hi Jörg,

 This query
 {
query : {
   bool: {
   must: {
match : { body : big }
},
   must_not: {
match : { body : data }
},
must: {
 match : {id: 521}
}
  }
}
 }

 and this query are performing exactly same
 {
query : {
   bool: {
   must: {
match : { body : big }
},
   must_not: {
match : { body : data }
}
  }
},
filter : {
term : { id : 521 }
}
 }

 I am not able understand what makes a filtered query fast. Is there any
 place where I can find documentation on the internals of how different
 queries are processed by elasticsearch.

 On Saturday, 23 August 2014 18:20:23 UTC+5:30, Jörg Prante wrote:

 Before firing queries, you should consider if the index design and
 query choice is optimal.

 Numeric range queries are not straightforward. They were a major issue
 on inverted index engines like Lucene/Elasticsearch and it has taken some
 time to introduce efficient implementations. See e.g.
 https://issues.apache.org/jira/browse/LUCENE-1673

 ES tries to compensate the downsides of massive numeric range queries
 by loading all the field values into memory. To achieve effective queries,
 you have to carefully discretize the values you index.

 For example, a few hundred millions of different timestamps, with
 millisecond resolution, are a real burden for searching on inverted
 indices. A good discretization strategy for indexing is to reduce the 
 total
 amount of values in such field to a few hundred or thousands. For
 timestamps, this means, indexing time-based series data in discrete
 intervals of days, hours, minutes, maybe seconds is much more efficient
 than e.g. millisecond resolution.

 Another topic is to use filters for boolean queries. They are much
 faster.

 Jörg



 On Sat, Aug 23, 2014 at 2:19 PM, Narendra Yadala narendr...@gmail.com
  wrote:

 Hi Ivan,

 Thanks for the input about aggregating on strings, I do that, but
 those queries take time but they do not crash node.

 The queries which caused problem were pretty straightforward queries
 (such as a boolean query with two musts, one must is equal match and 
 other
 a 

indices.memory.index_buffer_size

2014-08-24 Thread Yongtao You
Hi,

Is the indices.memory.index_buffer_size configuration a cluster wide 
configuration or per node configuration? Do I need to set it on every node? 
Or just the master (eligible) node?

Thanks.
Yongtao

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Topics/Entities with relevancy scores and searching

2014-08-24 Thread Scott Decker
Interesting.
so, set a payload on the term, in this case the topic/entity, and the 
payload is the relevancy value. Then, you can do your function score on the 
query of the main documents themselves, no need for parent/child.

Have you done this? any concerns to performance with this sort of scoring, 
or, it is just as fast if you were doing base lucene scoring if we override 
the score function and just use our own?
-- we will of course try it and run our own performance tests, just looking 
to see if you all ready have any insights. 

Super helpful!
Scott


On Saturday, August 23, 2014 7:50:18 AM UTC-7, Clinton Gormley wrote:

 Have a look at:

 * 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-delimited-payload-tokenfilter.html
 * 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html




 On 23 August 2014 15:04, Scott Decker sc...@publishthis.com javascript:
  wrote:

 Hey all,
   a question on possible search paths/structure.  If we have a text 
 document, and we have run our magic over it and come away with Topics and 
 Entities (Like, Barack Obama and Apple Inc.) and we have a relevancy score 
 for each one, what would be the best way to store and query against them?

 we currently are trying a parent/child relationship, where the children 
 are the terms with their relevancy score and the scoring of the parent text 
 document gets done from the relevancy scores of the children. That works. 
 Just worried about speed of parent/child against millions of documents.

 Another way we could think of was, build our own scorer/analyzer.  If we 
 are reading in tokens like BarackObama.93345|AppleInc.0034
 where it has the topic and the relevancy score to the document in it, i 
 can build an analyzer to read those sorts of tokens, but is there any way 
 to build a scorer that can use that token match data to score?

 and third, is there any other way to normalize this data into one 
 document so we can score on it. That seems like it would be the fastest way 
 to query, but my #2 option here is the only way I can think of doing it.  
 Anyone else tagging their documents with relevancy scores to topics, on the 
 document and then letting people search for those topics and pulling back 
 the relevant docs based on the per document relevancy scores?

 Thanks,
 Scott

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/9434db79-363f-4470-bf91-b960908c2de6%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/9434db79-363f-4470-bf91-b960908c2de6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3dd847f-99dc-4bad-9a2c-da9b6337ed8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Boost the first word in a multi-word query

2014-08-24 Thread Jérémy
Thanks Vineeth, I can certainly build something with the query string :-)


On Fri, Aug 22, 2014 at 8:50 PM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

 Hello Jeremy ,

 You can try query_string then.

 Query as Brown^2 dog


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query

 Thanks
Vineeth


 On Sat, Aug 23, 2014 at 12:11 AM, Jérémy mer...@gmail.com wrote:

 Thanks for your answer!

 Unfortunately the phrase query is not enough, because I still want to
 keep words optional. In my understanding, the phrase query requires all the
 words of the query to be present.

 Cheers,
 Jeremy


 On Fri, Aug 22, 2014 at 8:20 PM, vineeth mohan vm.vineethmo...@gmail.com
  wrote:

 Hello Jeremy ,

 I feel what you are looking for is a phrase query . It takes into
 consideration the order of words -
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase

 Thanks
   Vineeth


 On Fri, Aug 22, 2014 at 3:28 PM, Jeremy mer...@gmail.com wrote:

 In case of a multi-word query, is there a way to boost the first terms
 of the query?

 For example, in the following query:
 GET /my_index/my_type/_search
 {
 query: {
 match: {
 title: BROWN DOG!
 }
 }
 }

 Brown should be prioritized over dog, therefore searching for
 brown dog will not return the same scores as searching for dog brown.
 I'm ideally looking for a solution which work with N words and put
 weight accordingly the number of words.

 Regards,
 Jeremy

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5k5M8vasScWqjx%2BwHUD%2B-EGof2cLGJGH3YueMKpW0hYFQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5k5M8vasScWqjx%2BwHUD%2B-EGof2cLGJGH3YueMKpW0hYFQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwSTADYKzJo0HsxrLhpTteDYMDT-dkFcDHB8GZSmJ3_MA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: One large index vs. many smaller indexes

2014-08-24 Thread Chris Neal
Adrien,

Thanks so much for the response.  It was very helpful.  I will check out
those links on capacity planning for sure.

One followup question.  You mention that tens of shards per node would be
ok.  Are you meaning tens of shards from tens of indexes?  Or tens of
shards for a single index?  Right now I have two servers configured with
the index getting 2 shards (one per server), and 1 replica (per server).

Chris


On Fri, Aug 22, 2014 at 5:58 PM, Adrien Grand 
adrien.gr...@elasticsearch.com wrote:

 Hi Chris,

 Usually, the problem is not that much in terms of indices but shards,
 which are the physical units of data storage (an index being a logical view
 over several shards).

 Something to beware of is that shards typically have some constant
 overhead (disk space, file descriptors, memory usage) that does not depend
 on the amount of data that they store. Although it would be ok to have up
 to a few tens of shards per nodes, you should avoid to have eg. thousands
 of shards per node.

 if you plan on always adding a filter for a specific application in your
 search requests, then splitting by application makes sense since this will
 make the filter useless at search time, you will just need to query the
 application-specific index. On the other hand if you don't filter by
 application, then splitting data by yourself into smaller indices would be
 pretty equivalent to storing everything in a single index with a higher
 number of shards.

 You might want to check out the following resources that talk about
 capacity planning:
  - http://www.elasticsearch.org/videos/big-data-search-and-analytics/
  -
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html



 On Fri, Aug 22, 2014 at 9:08 PM, Chris Neal chris.n...@derbysoft.net
 wrote:

 Hi all,

 As the subject says, I'm wondering about index size vs. number of indexes.

 I'm indexing many application log files, currently with an index by day
 for all logs, which will make a very large index.  For just a few
 applications in Development, the index is 55GB a day (across 2 servers).
  In prod with all applications, it will be much more than that.  1TB a
 day maybe?

 I'm wondering if there is value in splitting the indexes by day and by
 application, which would produce more indexes per day, but they would be
 smaller, vs. value in having a single, mammoth index by day alone.

 Is it just a resource question?  If I have enough RAM/disk/CPU to support
 a mammoth index, then I'm fine?  Or are there other reasons to (or to
 not) split up indexes?

 Very much appreciate your time.
 Chris

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5i7AAnasMYZgR83aTXvELan%3DkR6OLvGYKfs9d5Subi4A%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5i7AAnasMYZgR83aTXvELan%3DkR6OLvGYKfs9d5Subi4A%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph9Z1My%2B2%2BQ-NM-sWNn2vT1qktDi6%2BmR-b9rFN-Xc-_pw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: indices.memory.index_buffer_size

2014-08-24 Thread Mark Walkom
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-indices.html
states It is a global setting that bubbles down to all the different
shards allocated on a specific node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 25 August 2014 03:12, Yongtao You yongtao@gmail.com wrote:

 Hi,

 Is the indices.memory.index_buffer_size configuration a cluster wide
 configuration or per node configuration? Do I need to set it on every node?
 Or just the master (eligible) node?

 Thanks.
 Yongtao

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZrhCzaXc4qAHeiOH8RKr6bKB-oRGH8YDFpz%3D34m29Y3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: What fields does ElasticSearch map by default?

2014-08-24 Thread vineeth mohan
Hello Albert ,

Few things here


   1. Yes , you cal tell Elasticsearch which fileds to index and which
   field not to index. You can use index :  yes/no property for each field
   in the schema to specify this.  -
   
http://stackoverflow.com/questions/13626617/specify-which-fields-are-indexed-in-elasticsearch
   2. There is a concept of _all in Elasticsearch. This would be a
   super-set of all field values and to search on the entire document , you
   can simply search on _all field. -
   
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html#mapping-all-field

Thanks
 Vineeth


On Sun, Aug 24, 2014 at 4:07 AM, Albert Lim albertlim...@gmail.com wrote:

 I'm trying to create an image metadata store, and obviously a single image
 can have 20 or more metadata fields.

 So if I enter this document into ElasticSearch, will it index/map all
 those fields? Such that I can query for every field? Or can I tell
 ElasticSearch what to index or not?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/7d8963cd-b9cd-40b3-97c6-e45b65d2760c%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/7d8963cd-b9cd-40b3-97c6-e45b65d2760c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m6f3PGizZwKh0S3GbrdXHs%3DL5KcPHHOh%3DYRSzxRztZhA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch Function Score not working with object type

2014-08-24 Thread Pablo Musa
Hey guys,
I am trying to use the function score but I am getting the following error:

ElasticsearchIllegalArgumentException[No field found for [fsot] in mapping
with types [tst]];

I have used function score before and it worked like a charm so I started
digging what was wrong. I found out that it does not work with object type.

Am I doing something wrong? What am I missing here?

The following gist contains an example and the error I received.
https://gist.github.com/pmusa/ef9a02210d736ee020d9

Thanks in advance,
Pablo Musa

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: DOS attack Elasticsearch with Mappings

2014-08-24 Thread Nikolas Everett
If the cluster is that open to users I don't think it'd be easy to prevent
a malicious user from intentionally DOSing it. But in this case I think you
could make the default for all fields be non-dynamic. That way users have
to intentionally send all mapping updates. It'd prevent this short of
unintentional DOS.

I think this is a setting that you can change and I think that it would
only effect new indexes but I admit to not having done it and going from a
vague memory of seeing a setting somewhere.

Nik
On Aug 24, 2014 11:08 PM, Joshua Montgomery josh1s4l...@gmail.com wrote:

 So an Elasticsearch clusters I help run had an interesting issue last week
 around mappings and I wanted to get the communities thoughts about how to
 handle it.

 *Issue:*
 Our cluster one morning went into utter chaos for no apparent reason. We
 had nodes dropping constantly (master and data type nodes) and lots of
 network exceptions in a our log files. The cluster kept going red from all
 the dropped nodes and the cluster was totally unresponsive to external
 commands.

 *Some Backgound:*
 Our cluster is fairly open to our users, meaning they can index what ever
 they want without needing approval (this may have to change based on what
 happened). The content stored is usually generated from .Net objects and
 serialized using the Netwonsoft json serializer.

 *Cause:*
 After 6hrs of investigation while trying to get our cluster stable, this
 is what we found:

 We had a new document type (around 30,000 documents) indexed into the
 cluster over a 1 hour window containing the .Net equivalent of a dictionary
 in json format. When a dictionary is serialized to json, it ends up with a
 json object containing a list of properties and values. The current
 behavior of Elasticsearch is to generate a mapping definition for each
 field name in a json object. So when you serialize a dictionary, it means
 every 'key' in the dictionary gets its own mapping definition. It turns out
 this can lead to nasty consequences when indexed in Elasticsearch...

 Essentially, every document contained its own list of unique keys which
 resulted in Elasticsearch generating mapping definitions for all the keys.
 We found this out by noticing that the json type with the dictionary
 continuously kept having is mappings updated (based on the master node log
 files). The continual updating of the mappings (which is part of the
 overall state file) caused the master nodes to lock up on the updates,
 effectively stopping all other cluster operations. The state file upon
 further investigation was over 70MB large by the time we ended up stopping
 the cluster. Stopping the cluster was the only way to stop updates to the
 mappings. The large mapping file we suspect was one of the major reasons
 for nodes dropping; connections would timeout during the large file copy
 (i'm assuming the state is passed around the nodes in the cluster).

 *Solution:*
 As previously mentioned we had to stop the cluster. We then had to make
 sure that all indexing operations were stopped. Upon restarting the cluster
 we deleted all documents of the poisonous document type (which took a
 while). This resulted is a much smaller state file and a stable cluster.

 *Prevention:*
 So this is my real question for the community, what is the correct action
 for preventing this in the future (or does it already exist). We could
 obviously start more closely reviewing what goes into our cluster, but
 should there be a feature in Elasticsearch to prevent this (assuming it
 doesn't already exist)? I'm assuming that there are a number of users who
 have clusters where they don't review everything that goes into their
 cluster. So would it make sense to have Elasticsearch provide some feature
 to prevent this issue, which is the equivalent to a DOS attack on the
 cluster?

 Thanks for reading this and I look forward to your responses!

 -Josh Montgomery

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0b-Q1y1152vA%3D%2BCYERGZxuk92iLDG3U-0L18q1oc1oxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need some advice to build a log central.

2014-08-24 Thread vineeth mohan
Hello Sang ,

As this is a question answer forum , we highly recommend you to take a shot
yourself and post questions if you have hit a dead end.

Thanks
   Vineeth


On Mon, Aug 25, 2014 at 7:56 AM, Sang Dang zkid...@gmail.com wrote:

 Hi All,
 I am going to build a log central using ElasticSearch.
 I need some advice from anyone who have built it already.




  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5af88543-5806-4021-83a5-41abc5b2bed6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/5af88543-5806-4021-83a5-41abc5b2bed6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DFpaQo8togCSKFR5J6bsqMBw_8-39uNPPj_Q6H2ag%2Bow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Function Score not working with object type

2014-08-24 Thread vineeth mohan
Hello Pablo ,

Lucene ( the underlying library library on which ES is build upon) has only
key value concept and it does not keep object level information.
This means that on Lucene side , data would be stored as

fsot.testObjects : [ test1 , test2 ]

And there is not field names as fsot on lucene side.
This means that you need to give the field name as fsot.testObjects rather
than fsot.

Thanks
 Vineeth


On Mon, Aug 25, 2014 at 7:57 AM, Pablo Musa pablitom...@gmail.com wrote:

 Hey guys,
 I am trying to use the function score but I am getting the following error:

 ElasticsearchIllegalArgumentException[No field found for [fsot] in mapping
 with types [tst]];

 I have used function score before and it worked like a charm so I started
 digging what was wrong. I found out that it does not work with object type.

 Am I doing something wrong? What am I missing here?

 The following gist contains an example and the error I received.
 https://gist.github.com/pmusa/ef9a02210d736ee020d9

 Thanks in advance,
 Pablo Musa

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5miTeN1xCHHz%2ByOLxoOX%2BwkOq8TAFgXFYrx6ziaJa%3DJ%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Function Score not working with object type

2014-08-24 Thread pablitomusa
It worked. Thank you very much.

* copying the final code for future referece:

POST test/tst/_search
{
  query: {
function_score: {
  boost_mode: replace, 
  query: {
filtered: {
  query: {
match_all: {}
  },
  filter: {
exists: {
  field: fsot
}
  }
}
  },
  functions: [
{
  script_score: {
script: if ( doc.get('fsot.testobj') == null ) 0; else 1;
  }
}
  ]
}
  }
}



On Monday, August 25, 2014 12:33:47 AM UTC-3, vineeth mohan wrote:

 Hello Pablo , 

 Lucene ( the underlying library library on which ES is build upon) has 
 only key value concept and it does not keep object level information.
 This means that on Lucene side , data would be stored as 

 fsot.testObjects : [ test1 , test2 ] 

 And there is not field names as fsot on lucene side.
 This means that you need to give the field name as fsot.testObjects rather 
 than fsot.

 Thanks
  Vineeth


 On Mon, Aug 25, 2014 at 7:57 AM, Pablo Musa pabli...@gmail.com 
 javascript: wrote:

 Hey guys,
 I am trying to use the function score but I am getting the following 
 error:

 ElasticsearchIllegalArgumentException[No field found for [fsot] in 
 mapping with types [tst]];

 I have used function score before and it worked like a charm so I started 
 digging what was wrong. I found out that it does not work with object type.

 Am I doing something wrong? What am I missing here?

 The following gist contains an example and the error I received.
 https://gist.github.com/pmusa/ef9a02210d736ee020d9

 Thanks in advance,
 Pablo Musa

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com
  
 https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf051fbb-a860-4898-864b-fedb54c8500c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Error running ES DSL in hadoop mapreduce

2014-08-24 Thread Sona Samad
Hi Adrien,
 
My elasticsearch version is :  elasticsearch-1.2.1 
 
The Maven dependency for hadoop:
 
dependency
  groupIdorg.elasticsearch/groupId
  artifactIdelasticsearch-hadoop-mr/artifactId
  version2.0.1/version
/dependency 
 
 
The full stack trace is given below:
 
[2014-08-25 09:31:58,892][DEBUG][action.search.type   ] [Thane Ector] 
[mr][4], node[1ZbXSvkKQC-kDvgMXuC8iQ], [P], s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@6ed78f6d]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][4]: 
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed 
[Failed to execute main query]
 at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
 at 
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
 at 
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
 at 
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
 at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 97
 at 
org.elasticsearch.common.util.BigArrays$IntArrayWrapper.set(BigArrays.java:185)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus$Hashset.values(HyperLogLogPlusPlus.java:499)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.upgradeToHll(HyperLogLogPlusPlus.java:307)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLcEncoded(HyperLogLogPlusPlus.java:245)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLc(HyperLogLogPlusPlus.java:239)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:231)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:204)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.collect(CardinalityAggregator.java:118)
 at 
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
 at 
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
 at 
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collect(GlobalOrdinalsStringTermsAggregator.java:98)
 at 
org.elasticsearch.search.aggregations.AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:157)
 at 
org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:60)
 at 
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
 at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
 at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
 at 
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
 at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
 ... 7 more
[2014-08-25 09:31:58,894][DEBUG][action.search.type   ] [Thane Ector] 
All shards failed for phase: [init_scan]
 
Thanks,
Sona
 

On Friday, August 22, 2014 5:07:33 PM UTC+5:30, Sona Samad wrote:

 Hi,

 I was trying to run the below query from hadoop mapreduce:

 {
  aggs: {
 group_by_body_part: {
   terms: {
 field: body_part,
 size: 5,
 order : { examcount : desc }
 },
   aggs: {
 examcount: {
   cardinality: {
 field: ExamRowKey
   }
 }
   }
 }
   }
 }

 The query is returning more than 5 records, even when the size is given as 
 5. 
 Also, the result was not aggregated, rather it returns the entire record 
 from the index as value to mapper.

 Also the following error is logged:

 [2014-08-22 16:06:21,459][DEBUG][action.search.type   ] [Algrim the 
 Strong] All shards failed for phase: [init_scan]
 [2014-08-22 16:26:38,875][DEBUG][action.search.type   ] [Algrim the 
 Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to 
 execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
 org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]: 
 query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed 
 [Failed to execute main query]
 at 
 org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
 at