date:20140824

Re: Elasticsearch shuts down for no reason

2014-08-24 Thread Eitan Vesely


Thanks Mark,

auth.log doesnt show any login or sudo at the time of the elastic 
stopping...
nothing else is running on that machine - it is a dedicated ES server.

what i did find in the auth log is that someone is trying to hack into the 
system, yet i dont see how it got to do with elastic stopping?

On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote:

 Something is stopping the service.

 If you are on linux check the auth log, if anyone is using sudo to stop it 
 then you will see that logged. Otherwise, what else runs on the machine?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com javascript: 
 wrote:

 Hi Guys,
 i've installed ES a month ago and its working just fine.

 today, for some reason, ES just went down for no visible reason:

 here is what i see in the log file : 

 [2014-08-23 16:47:11,272][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] 
 lastShard [true]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][1]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at 
 org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at 
 org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:633)
  ... 9 more
 [2014-08-23 16:47:11,273][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][0]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at 
 org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at 
 org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:633)
  ... 9 more

Re: Elasticsearch shuts down for no reason

2014-08-24 Thread Eitan Vesely


I did find the shutdown request in the syslog:

Aug 23 16:49:01 medisafelog2 kernel: [3361057.489168] hv_utils: Shutdown 
request received - graceful shutdown initiated

yet i have no idea who or what initiated it... how can i dig in?

On Sunday, August 24, 2014 12:01:50 PM UTC+3, Eitan Vesely wrote:


 Thanks Mark,

 auth.log doesnt show any login or sudo at the time of the elastic 
 stopping...
 nothing else is running on that machine - it is a dedicated ES server.

 what i did find in the auth log is that someone is trying to hack into the 
 system, yet i dont see how it got to do with elastic stopping?

 On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote:

 Something is stopping the service.

 If you are on linux check the auth log, if anyone is using sudo to stop 
 it then you will see that logged. Otherwise, what else runs on the machine?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com wrote:

 Hi Guys,
 i've installed ES a month ago and its working just fine.

 today, for some reason, ES just went down for no visible reason:

 here is what i see in the log file : 

 [2014-08-23 16:47:11,272][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] 
 lastShard [true]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][1]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at 
 org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at 
 org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:633)
  ... 9 more
 [2014-08-23 16:47:11,273][DEBUG][action.search.type   ] [Plunderer] 
 [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad]
 org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][0]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]]
  at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:649)
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:511)
  at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483)
 at 
 org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
 at 
 org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
  at 
 org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (key) field [@timestamp] not found
 at

Re: Elasticsearch shuts down for no reason

2014-08-24 Thread joergpra...@gmail.com

The company which is providing the hosting service had shut down the
virtual machine, hv_utils is a message from the hypervisor.

This is not related to Elasticsearch at all.

Jörg


On Sun, Aug 24, 2014 at 11:19 AM, Mark Walkom ma...@campaignmonitor.com
wrote:

 What version of ES are you running?
 Are you running on a hosting service and if so do you have a firewall
 protecting the host - ie it's not open to the entire internet?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 24 August 2014 19:05, Eitan Vesely eitan...@gmail.com wrote:


 I did find the shutdown request in the syslog:

 Aug 23 16:49:01 medisafelog2 kernel: [3361057.489168] hv_utils: Shutdown
 request received - graceful shutdown initiated

 yet i have no idea who or what initiated it... how can i dig in?


 On Sunday, August 24, 2014 12:01:50 PM UTC+3, Eitan Vesely wrote:


 Thanks Mark,

 auth.log doesnt show any login or sudo at the time of the elastic
 stopping...
 nothing else is running on that machine - it is a dedicated ES server.

 what i did find in the auth log is that someone is trying to hack into
 the system, yet i dont see how it got to do with elastic stopping?

 On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote:

 Something is stopping the service.

 If you are on linux check the auth log, if anyone is using sudo to stop
 it then you will see that logged. Otherwise, what else runs on the machine?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com wrote:

 Hi Guys,
 i've installed ES a month ago and its working just fine.

 today, for some reason, ES just went down for no visible reason:

 here is what i see in the log file :

 [2014-08-23 16:47:11,272][DEBUG][action.search.type   ]
 [Plunderer] [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P],
 s[STARTED]: Failed to execute [org.elasticsearch.action.
 search.SearchRequest@5531dfad] lastShard [true]
 org.elasticsearch.search.SearchParseException:
 [g30nm0bi2j663tgu6ud][1]: from[-1],size[-1]: Parse Failure [Failed to 
 parse
 source [{facets:{0:{date_histogram:{key_field:@
 timestamp,value_field:user_count,interval:1h},
 global:true,facet_filter:{fquery:{query:{filtered:{
 query:{query_string:{query:*}},filter:{bool:
 {must:[{range:{@timestamp:{from:1407602785182,to:
 1408812385182}}},{range:{@timestamp:{from:1408516424602,to:
 1408811520255}}}],size:0}]]
  at org.elasticsearch.search.SearchService.parseSource(
 SearchService.java:649)
 at org.elasticsearch.search.SearchService.createContext(
 SearchService.java:511)
  at org.elasticsearch.search.SearchService.createAndPutContext(
 SearchService.java:483)
 at org.elasticsearch.search.SearchService.executeQueryPhase(
 SearchService.java:252)
  at org.elasticsearch.search.action.SearchServiceTransportAction$
 5.call(SearchServiceTransportAction.java:206)
 at org.elasticsearch.search.action.SearchServiceTransportAction$
 5.call(SearchServiceTransportAction.java:203)
  at org.elasticsearch.search.action.SearchServiceTransportAction$
 23.run(SearchServiceTransportAction.java:517)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException:
 Facet [0]: (key) field [@timestamp] not found
 at org.elasticsearch.search.facet.datehistogram.
 DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
  at org.elasticsearch.search.facet.FacetParseElement.parse(
 FacetParseElement.java:93)
 at org.elasticsearch.search.SearchService.parseSource(
 SearchService.java:633)
  ... 9 more
 [2014-08-23 16:47:11,273][DEBUG][action.search.type   ]
 [Plunderer] [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P],
 s[STARTED]: Failed to execute [org.elasticsearch.action.
 search.SearchRequest@5531dfad]
 org.elasticsearch.search.SearchParseException:
 [g30nm0bi2j663tgu6ud][0]: from[-1],size[-1]: Parse Failure [Failed to 
 parse
 source [{facets:{0:{date_histogram:{key_field:@
 timestamp,value_field:user_count,interval:1h},
 global:true,facet_filter:{fquery:{query:{filtered:{
 query:{query_string:{query:*}},filter:{bool:
 {must:[{range:{@timestamp:{from:1407602785182,to:
 1408812385182}}},{range:{@timestamp:{from:1408516424602,to:
 1408811520255}}}],size:0}]]
  at org.elasticsearch.search.SearchService.parseSource(
 SearchService.java:649)
 at org.elasticsearch.search.SearchService.createContext(
 SearchService.java:511)
  at org.elasticsearch.search.SearchService.createAndPutContext(
 SearchService.java:483)
 at org.elasticsearch.search.SearchService.executeQueryPhase(
 SearchService.java:252)
  at

date_histogram facet float possible overflow

2014-08-24 Thread moshe zada

 

HI all,

I am using ELK stack to visualising our monitoring data, yesterday i came 
across a weird problem: ElasticSearch date_histogram facet returned 
floating results that look like an overflow (min : 4.604480259023595*E*
18).
Our dataflow is : collectd (cpu/memory) - sends it to riemann - logstash 
- elasticsearch 

At first the values were correct, after a few days the values became huge 
(see attached snapshot of kibana graph)

*filtered query + Result:*

*query:*
url -XGET 'http://localhost:9200/logstash-2014.08.24/_search?pretty' -d '{
  query: {
filtered: {
  query: {
bool: {
  should: [
{
  query_string: {
query: 
subservice.raw:\processes-cpu_percent/gauge-collectd\ AND 
(plugin_instance:\cpu_percent\)
  }
}
  ]
}
  },
  filter: {
bool: {
  must: [
{
  range: {
@timestamp: {
  from: 1408884312966,
  to: 1408884612966
}
  }
},
{
  range: {
@timestamp: {
  from: 1408884311948,
  to: 1408884327941
}
  }
},
{
  fquery: {
query: {
  query_string: {
query: 
subservice:(\processes-cpu_percent/gauge-collectd\)
  }
},
_cache: false
  }
}
  ]
}
  }
}
  },
  size: 500,
  sort: [
{
  metric: {
order: desc,
ignore_unmapped: false
  }
},
{
  @timestamp: {
order: desc,
ignore_unmapped: false
  }
}
  ]
}'




*result:*
{
  took : 47,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 2,
max_score : null,
hits : [ {
  _index : logstash-2014.08.24,
  _type : gauge,
  _id : SlzG8bGJQziU0LMoN7nrbQ,
  _score : null,
  _source:{host:host1,service:
instance-2014-08-24T1106/processes-cpu_percent/gauge-collectd,state:null
,description:null,metric:0.7,tags:[collectd],time:
2014-08-24T12:45:25.000Z,ttl:20.0,type:gauge,source:host1,
ds_type:gauge,plugin_instance:cpu_percent,ds_name:value,
type_instance:collectd,plugin:processes,ds_index:0,@version:
1,@timestamp:2014-08-24T12:45:15.079Z},
  sort : [ 4604480259023595110, 1408884325088 ]

}, {

  _index : logstash-2014.08.24,
  _type : gauge,
  _id : 8hxToMjpQ5WQIw15DQqIGA,
  _score : null,
  _source:{host:host1,service:
instance-2014-08-24T1106/processes-cpu_percent/gauge-collectd,state:null
,description:null,metric:0.5,tags:[collectd],time:
2014-08-24T12:45:15.000Z,ttl:20.0,type:gauge,source:host1,
ds_type:gauge,plugin_instance:cpu_percent,ds_name:value,
type_instance:collectd,plugin:processes,ds_index:0,@version:
1,@timestamp:2014-08-24T12:45:15.079Z},
  sort : [ 4602678819172646912, 1408884315079 ]
} ]
  }
}




*date histogram Facet + Results:query:*
curl -XGET 'http://localhost:9200/logstash-2014.08.24/_search?pretty' -d '{
  facets: {
0: {
  date_histogram: {
key_field: @timestamp,
value_field: metric,
interval: 1s
  },
  global: true,
  facet_filter: {
fquery: {
  query: {
filtered: {
  query: {
query_string: {
  query: 
subservice.raw:\processes-cpu_percent/gauge-collectd\ AND 
(plugin_instance:cpu_percent) AND *
}
  },
  filter: {
bool: {
  must: [
{
  range: {
@timestamp: {
  from: 1408884199622,
  to: 1408884499623
}
  }
},
{
  range: {
@timestamp: {
  from: 1408884311948,
  to: 1408884327941
}
  }
},
{
  fquery: {
query: {
  query_string: {
query: 
subservice:(\processes-cpu_percent/gauge-collectd\)
  }
},
_cache: true
  }
}
  ]
}
  }
}
  }
}
  }
}
  },
  size: 0
}' 



*result:*
{
  took : 24,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 1197141,
max_score : 0.0,
hits : [ ]
  },
  facets : {
0 : {
  _type : date_histogram,
  entries : [ {

Re: Json Data not getting parsed when sent to Elasticsearch

2014-08-24 Thread moshe zada

what is your logstash configuration?
did you tried the json codec http://logstash.net/docs/1.4.2/codecs/json?

On Sunday, August 24, 2014 4:54:08 PM UTC+3, Didjit wrote:

 Hi,

 The following is a debug from Logstash:

 {
 message = 
 {\EventTime\:\2014-08-24T09:44:46-0400\,\URI\:\
 http://ME/rest/venue/ME/hours/2014-08-24\
 ,\uri_payload\:{\value\:[{\open\:\2014-08-24T13:00:00.000+\,\close\:\2014-08-24T23:00:00.000+\,\isOpen\:true,\date\:\2014-08-24\}],\Count\:1}}\r,
@version = 1,
  @timestamp = 2014-08-24T13:44:48.036Z,
host = 127.0.0.1:60778,
type = MY_Detail,
   EventTime = 2014-08-24T09:44:46-0400,
 URI = http://ME/rest/venue/ME//hours/2014-08-24;,
 uri_payload = {
 value = [
 [0] {
   open = 2014-08-24T13:00:00.000+,
  close = 2014-08-24T23:00:00.000+,
 isOpen = true,
   date = 2014-08-24
 }
 ],
 Count = 1,
 0 = {}
 },
  MYId = ME
 }
 ___

 When i look into Elasticsearch, the fields under URI Payload are not 
 parsed. It shows:

 uri_payload.value as the field with 
 {open:2014-08-21T13:00:00.000+,close:2014-08-21T23:00:00.000+,isOpen:true,date:2014-08-21}

 How can I get all the parsed values as fields in elasticsearch? In my 
 example, fields Open, Close, IsOpen. Initially I thought Logstash was not 
 parsing all the json, but looking at the debug it is.

 Thank you,

 Chris





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fe60df4d-cd36-43c9-a08c-7213abc2dd18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

2014-08-24 Thread Jonathan Foy

I ran into the same issue when using Integer.MAX_VALUE as the size
parameter (migrating from a DB-based search). Perhaps someone can come up
with a proper reference, I cannot, but according to a comment in this SO
http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records

question, Elasticsearch/Lucene tries to allocate memory for that many
scores. When I switched those queries to a count/search duo, things
improved dramatically, as you've already noticed.

On Saturday, August 23, 2014 12:17:47 PM UTC-4, Narendra Yadala wrote:

I am not returning 2 billion documents :)

I am returning all documents that match. Actual number can be anywhere
between 0 to 50k. I am just fetching documents between a given time
interval such as one hour, one day so on and then do batch processing them.

I fixed this by making 2 queries, one to fetch count and other for actual
data. It is mentioned in some other thread that scroll api is performance
intensive so I did not go for it.

On Saturday, 23 August 2014 21:32:59 UTC+5:30, Ivan Brusic wrote:

When I kept size as Integer.MAX_VALUE, it caused all the problems

Are you trying to return up to 2 billion documents at once? Even if that
number was only 1 million, you will face problems. Or did I perhaps
misunderstand you?

Are you sorting the documents based on the score (the default)?
Lucene/Elasticsearch would need to keep all the values in memory in order
to start them, causing memory problems. In general, Lucene is not effective
at deep pagination. Use scan/scroll:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

--
Ivan

On Sat, Aug 23, 2014 at 6:46 AM, Narendra Yadala narendr...@gmail.com
wrote:

Hi Jörg,

This query
{
query : {
bool: {
must: {
match : { body : big }
},
must_not: {
match : { body : data }
},
must: {
match : {id: 521}
}
}
}
}

and this query are performing exactly same
{
query : {
bool: {
must: {
match : { body : big }
},
must_not: {
match : { body : data }
}
}
},
filter : {
term : { id : 521 }
}
}

I am not able understand what makes a filtered query fast. Is there any
place where I can find documentation on the internals of how different
queries are processed by elasticsearch.

On Saturday, 23 August 2014 18:20:23 UTC+5:30, Jörg Prante wrote:

Before firing queries, you should consider if the index design and
query choice is optimal.

Numeric range queries are not straightforward. They were a major issue
on inverted index engines like Lucene/Elasticsearch and it has taken some
time to introduce efficient implementations. See e.g.
https://issues.apache.org/jira/browse/LUCENE-1673

ES tries to compensate the downsides of massive numeric range queries
by loading all the field values into memory. To achieve effective queries,
you have to carefully discretize the values you index.

For example, a few hundred millions of different timestamps, with
millisecond resolution, are a real burden for searching on inverted
indices. A good discretization strategy for indexing is to reduce the
total
amount of values in such field to a few hundred or thousands. For
timestamps, this means, indexing time-based series data in discrete
intervals of days, hours, minutes, maybe seconds is much more efficient
than e.g. millisecond resolution.

Another topic is to use filters for boolean queries. They are much
faster.

Jörg

On Sat, Aug 23, 2014 at 2:19 PM, Narendra Yadala narendr...@gmail.com
wrote:

Hi Ivan,

Thanks for the input about aggregating on strings, I do that, but
those queries take time but they do not crash node.

The queries which caused problem were pretty straightforward queries
(such as a boolean query with two musts, one must is equal match and
other
a range match on long) but the real problem was with the size. When I
kept
size as Integer.MAX_VALUE, it caused all the problems. When I removed it,
it started working fine. I think it is worth mentioning somewhere about
this strange behavior (probably expected but strange).

I did double up on the RAM though and now I have allocated 5*10G RAM
to the cluster. Things are looking ok as of now, except that the
aggregations (on strings) are quite slow. May be I would run these
aggregations as batch and cache the outputs in a different type and move
on
for now.

Thanks
NY

On Fri, Aug 22, 2014 at 10:34 PM, Ivan Brusic iv...@brusic.com
wrote:

How expensive are your queries? Are you using aggregations or sorting
on string fields that could use up your field data cache? Are you using
the
defaults for the cache? Post the current usage.

If you

Re: Json Data not getting parsed when sent to Elasticsearch

2014-08-24 Thread Didjit

Pretty simple (below). . I just added to json codec and tried again and 
received the same results. Thank you!

elasticsearch { 
host = localhost 
cluster = cjceswin
node_name = cjcnode
codec = json
 index = logstash-dwhse-%{+.MM.dd}
 workers = 3
}

}

On Sunday, August 24, 2014 10:11:44 AM UTC-4, moshe zada wrote:

 what is your logstash configuration?
 did you tried the json codec http://logstash.net/docs/1.4.2/codecs/json?

 On Sunday, August 24, 2014 4:54:08 PM UTC+3, Didjit wrote:

 Hi,

 The following is a debug from Logstash:

 {
 message = 
 {\EventTime\:\2014-08-24T09:44:46-0400\,\URI\:\
 http://ME/rest/venue/ME/hours/2014-08-24\
 ,\uri_payload\:{\value\:[{\open\:\2014-08-24T13:00:00.000+\,\close\:\2014-08-24T23:00:00.000+\,\isOpen\:true,\date\:\2014-08-24\}],\Count\:1}}\r,
@version = 1,
  @timestamp = 2014-08-24T13:44:48.036Z,
host = 127.0.0.1:60778,
type = MY_Detail,
   EventTime = 2014-08-24T09:44:46-0400,
 URI = http://ME/rest/venue/ME//hours/2014-08-24;,
 uri_payload = {
 value = [
 [0] {
   open = 2014-08-24T13:00:00.000+,
  close = 2014-08-24T23:00:00.000+,
 isOpen = true,
   date = 2014-08-24
 }
 ],
 Count = 1,
 0 = {}
 },
  MYId = ME
 }
 ___

 When i look into Elasticsearch, the fields under URI Payload are not 
 parsed. It shows:

 uri_payload.value as the field with 
 {open:2014-08-21T13:00:00.000+,close:2014-08-21T23:00:00.000+,isOpen:true,date:2014-08-21}

 How can I get all the parsed values as fields in elasticsearch? In my 
 example, fields Open, Close, IsOpen. Initially I thought Logstash was not 
 parsing all the json, but looking at the debug it is.

 Thank you,

 Chris





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0afd4105-a521-487a-8889-4bcabee419b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

2014-08-24 Thread joergpra...@gmail.com

Exactly. Filters do not use scores. They also use bit sets which makes them
reusable and fast.

I wasn't talking about a filter added to a query, I mean filtered queries.
This is a huge difference.

This query

{
   query : {
  bool: {
  must: {
   match : { body : big }
   },
  must_not: {
   match : { body : data }
   },
   must: {
match : {id: 521}
   }
 }
   }
}

can be turned into this filtered query

{
 query : {
   constant_score: {
 filter: {
   bool: {
 must: [
  { match : { body : big } },
  {match : {id: 521} }
 ],
 must_not: {
  match : { body : data }
 }
   }
 }
   }
  }
}

(plus fixing the double key must which is a potential source of errors)

Jörg



On Sun, Aug 24, 2014 at 4:30 PM, Jonathan Foy the...@gmail.com wrote:

 I ran into the same issue when using Integer.MAX_VALUE as the size
 parameter (migrating from a DB-based search).  Perhaps someone can come up
 with a proper reference, I cannot, but according to a comment in this SO
 http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records
 question, Elasticsearch/Lucene tries to allocate memory for that many
 scores.  When I switched those queries to a count/search duo, things
 improved dramatically, as you've already noticed.


 On Saturday, August 23, 2014 12:17:47 PM UTC-4, Narendra Yadala wrote:


 I am not returning 2 billion documents :)

 I am returning all documents that match. Actual number can be anywhere
 between 0 to 50k. I am just fetching documents between a given time
 interval such as one hour, one day so on and then do batch processing them.

 I fixed this by making 2 queries, one to fetch count and other for actual
 data. It is mentioned in some other thread that scroll api is performance
 intensive so I did not go for it.

 On Saturday, 23 August 2014 21:32:59 UTC+5:30, Ivan Brusic wrote:

 When I kept size as Integer.MAX_VALUE, it caused all the problems

 Are you trying to return up to 2 billion documents at once? Even if that
 number was only 1 million, you will face problems. Or did I perhaps
 misunderstand you?

 Are you sorting the documents based on the score (the default)?
 Lucene/Elasticsearch would need to keep all the values in memory in order
 to start them, causing memory problems. In general, Lucene is not effective
 at deep pagination. Use scan/scroll:

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/search-request-scroll.html

 --
 Ivan


 On Sat, Aug 23, 2014 at 6:46 AM, Narendra Yadala narendr...@gmail.com
 wrote:

 Hi Jörg,

 This query
 {
query : {
   bool: {
   must: {
match : { body : big }
},
   must_not: {
match : { body : data }
},
must: {
 match : {id: 521}
}
  }
}
 }

 and this query are performing exactly same
 {
query : {
   bool: {
   must: {
match : { body : big }
},
   must_not: {
match : { body : data }
}
  }
},
filter : {
term : { id : 521 }
}
 }

 I am not able understand what makes a filtered query fast. Is there any
 place where I can find documentation on the internals of how different
 queries are processed by elasticsearch.

 On Saturday, 23 August 2014 18:20:23 UTC+5:30, Jörg Prante wrote:

 Before firing queries, you should consider if the index design and
 query choice is optimal.

 Numeric range queries are not straightforward. They were a major issue
 on inverted index engines like Lucene/Elasticsearch and it has taken some
 time to introduce efficient implementations. See e.g.
 https://issues.apache.org/jira/browse/LUCENE-1673

 ES tries to compensate the downsides of massive numeric range queries
 by loading all the field values into memory. To achieve effective queries,
 you have to carefully discretize the values you index.

 For example, a few hundred millions of different timestamps, with
 millisecond resolution, are a real burden for searching on inverted
 indices. A good discretization strategy for indexing is to reduce the 
 total
 amount of values in such field to a few hundred or thousands. For
 timestamps, this means, indexing time-based series data in discrete
 intervals of days, hours, minutes, maybe seconds is much more efficient
 than e.g. millisecond resolution.

 Another topic is to use filters for boolean queries. They are much
 faster.

 Jörg



 On Sat, Aug 23, 2014 at 2:19 PM, Narendra Yadala narendr...@gmail.com
  wrote:

 Hi Ivan,

 Thanks for the input about aggregating on strings, I do that, but
 those queries take time but they do not crash node.

 The queries which caused problem were pretty straightforward queries
 (such as a boolean query with two musts, one must is equal match and 
 other
 a

indices.memory.index_buffer_size

2014-08-24 Thread Yongtao You

Hi,

Is the indices.memory.index_buffer_size configuration a cluster wide 
configuration or per node configuration? Do I need to set it on every node? 
Or just the master (eligible) node?

Thanks.
Yongtao

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Topics/Entities with relevancy scores and searching

2014-08-24 Thread Scott Decker

Interesting.
so, set a payload on the term, in this case the topic/entity, and the
payload is the relevancy value. Then, you can do your function score on the
query of the main documents themselves, no need for parent/child.

Have you done this? any concerns to performance with this sort of scoring,
or, it is just as fast if you were doing base lucene scoring if we override
the score function and just use our own?
-- we will of course try it and run our own performance tests, just looking
to see if you all ready have any insights.

Super helpful!
Scott

On Saturday, August 23, 2014 7:50:18 AM UTC-7, Clinton Gormley wrote:

Have a look at:

*
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-delimited-payload-tokenfilter.html
*
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

On 23 August 2014 15:04, Scott Decker sc...@publishthis.com javascript:
wrote:

Hey all,
a question on possible search paths/structure. If we have a text
document, and we have run our magic over it and come away with Topics and
Entities (Like, Barack Obama and Apple Inc.) and we have a relevancy score
for each one, what would be the best way to store and query against them?

we currently are trying a parent/child relationship, where the children
are the terms with their relevancy score and the scoring of the parent text
document gets done from the relevancy scores of the children. That works.
Just worried about speed of parent/child against millions of documents.

Another way we could think of was, build our own scorer/analyzer. If we
are reading in tokens like BarackObama.93345|AppleInc.0034
where it has the topic and the relevancy score to the document in it, i
can build an analyzer to read those sorts of tokens, but is there any way
to build a scorer that can use that token match data to score?

and third, is there any other way to normalize this data into one
document so we can score on it. That seems like it would be the fastest way
to query, but my #2 option here is the only way I can think of doing it.
Anyone else tagging their documents with relevancy scores to topics, on the
document and then letting people search for those topics and pulling back
the relevant docs based on the per document relevancy scores?

Thanks,
Scott

https://groups.google.com/d/msgid/elasticsearch/9434db79-363f-4470-bf91-b960908c2de6%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b3dd847f-99dc-4bad-9a2c-da9b6337ed8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Boost the first word in a multi-word query

2014-08-24 Thread Jérémy

Thanks Vineeth, I can certainly build something with the query string :-)

On Fri, Aug 22, 2014 at 8:50 PM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

Hello Jeremy ,

You can try query_string then.

Query as Brown^2 dog

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query

Thanks
Vineeth

On Sat, Aug 23, 2014 at 12:11 AM, Jérémy mer...@gmail.com wrote:

Thanks for your answer!

Unfortunately the phrase query is not enough, because I still want to
keep words optional. In my understanding, the phrase query requires all the
words of the query to be present.

Cheers,
Jeremy

On Fri, Aug 22, 2014 at 8:20 PM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

Hello Jeremy ,

I feel what you are looking for is a phrase query . It takes into
consideration the order of words -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase

Thanks
Vineeth

On Fri, Aug 22, 2014 at 3:28 PM, Jeremy mer...@gmail.com wrote:

In case of a multi-word query, is there a way to boost the first terms
of the query?

For example, in the following query:
GET /my_index/my_type/_search
{
query: {
match: {
title: BROWN DOG!
}
}
}

Brown should be prioritized over dog, therefore searching for
brown dog will not return the same scores as searching for dog brown.
I'm ideally looking for a solution which work with N words and put
weight accordingly the number of words.

Regards,
Jeremy

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5k5M8vasScWqjx%2BwHUD%2B-EGof2cLGJGH3YueMKpW0hYFQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5k5M8vasScWqjx%2BwHUD%2B-EGof2cLGJGH3YueMKpW0hYFQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwSTADYKzJo0HsxrLhpTteDYMDT-dkFcDHB8GZSmJ3_MA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: One large index vs. many smaller indexes

2014-08-24 Thread Chris Neal

Adrien,

Thanks so much for the response. It was very helpful. I will check out
those links on capacity planning for sure.

One followup question. You mention that tens of shards per node would be
ok. Are you meaning tens of shards from tens of indexes? Or tens of
shards for a single index? Right now I have two servers configured with
the index getting 2 shards (one per server), and 1 replica (per server).

Chris

On Fri, Aug 22, 2014 at 5:58 PM, Adrien Grand
adrien.gr...@elasticsearch.com wrote:

Hi Chris,

Usually, the problem is not that much in terms of indices but shards,
which are the physical units of data storage (an index being a logical view
over several shards).

Something to beware of is that shards typically have some constant
overhead (disk space, file descriptors, memory usage) that does not depend
on the amount of data that they store. Although it would be ok to have up
to a few tens of shards per nodes, you should avoid to have eg. thousands
of shards per node.

if you plan on always adding a filter for a specific application in your
search requests, then splitting by application makes sense since this will
make the filter useless at search time, you will just need to query the
application-specific index. On the other hand if you don't filter by
application, then splitting data by yourself into smaller indices would be
pretty equivalent to storing everything in a single index with a higher
number of shards.

You might want to check out the following resources that talk about
capacity planning:
- http://www.elasticsearch.org/videos/big-data-search-and-analytics/
-
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html

On Fri, Aug 22, 2014 at 9:08 PM, Chris Neal chris.n...@derbysoft.net
wrote:

Hi all,

As the subject says, I'm wondering about index size vs. number of indexes.

I'm indexing many application log files, currently with an index by day
for all logs, which will make a very large index. For just a few
applications in Development, the index is 55GB a day (across 2 servers).
In prod with all applications, it will be much more than that. 1TB a
day maybe?

I'm wondering if there is value in splitting the indexes by day and by
application, which would produce more indexes per day, but they would be
smaller, vs. value in having a single, mammoth index by day alone.

Is it just a resource question? If I have enough RAM/disk/CPU to support
a mammoth index, then I'm fine? Or are there other reasons to (or to
not) split up indexes?

Very much appreciate your time.
Chris

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5i7AAnasMYZgR83aTXvELan%3DkR6OLvGYKfs9d5Subi4A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5i7AAnasMYZgR83aTXvELan%3DkR6OLvGYKfs9d5Subi4A%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph9Z1My%2B2%2BQ-NM-sWNn2vT1qktDi6%2BmR-b9rFN-Xc-_pw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: indices.memory.index_buffer_size

2014-08-24 Thread Mark Walkom

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-indices.html
states It is a global setting that bubbles down to all the different
shards allocated on a specific node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 25 August 2014 03:12, Yongtao You yongtao@gmail.com wrote:

Hi,

Is the indices.memory.index_buffer_size configuration a cluster wide
configuration or per node configuration? Do I need to set it on every node?
Or just the master (eligible) node?

Thanks.
Yongtao

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZrhCzaXc4qAHeiOH8RKr6bKB-oRGH8YDFpz%3D34m29Y3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: What fields does ElasticSearch map by default?

2014-08-24 Thread vineeth mohan

Hello Albert ,

Few things here

1. Yes , you cal tell Elasticsearch which fileds to index and which
field not to index. You can use index : yes/no property for each field
in the schema to specify this. -

http://stackoverflow.com/questions/13626617/specify-which-fields-are-indexed-in-elasticsearch
2. There is a concept of _all in Elasticsearch. This would be a
super-set of all field values and to search on the entire document , you
can simply search on _all field. -

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html#mapping-all-field

Thanks
Vineeth

On Sun, Aug 24, 2014 at 4:07 AM, Albert Lim albertlim...@gmail.com wrote:

I'm trying to create an image metadata store, and obviously a single image
can have 20 or more metadata fields.

So if I enter this document into ElasticSearch, will it index/map all
those fields? Such that I can query for every field? Or can I tell
ElasticSearch what to index or not?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7d8963cd-b9cd-40b3-97c6-e45b65d2760c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7d8963cd-b9cd-40b3-97c6-e45b65d2760c%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m6f3PGizZwKh0S3GbrdXHs%3DL5KcPHHOh%3DYRSzxRztZhA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch Function Score not working with object type

2014-08-24 Thread Pablo Musa

Hey guys,
I am trying to use the function score but I am getting the following error:

ElasticsearchIllegalArgumentException[No field found for [fsot] in mapping
with types [tst]];

I have used function score before and it worked like a charm so I started
digging what was wrong. I found out that it does not work with object type.

Am I doing something wrong? What am I missing here?

The following gist contains an example and the error I received.
https://gist.github.com/pmusa/ef9a02210d736ee020d9

Thanks in advance,
Pablo Musa

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: DOS attack Elasticsearch with Mappings

2014-08-24 Thread Nikolas Everett

If the cluster is that open to users I don't think it'd be easy to prevent
a malicious user from intentionally DOSing it. But in this case I think you
could make the default for all fields be non-dynamic. That way users have
to intentionally send all mapping updates. It'd prevent this short of
unintentional DOS.

I think this is a setting that you can change and I think that it would
only effect new indexes but I admit to not having done it and going from a
vague memory of seeing a setting somewhere.

Nik
On Aug 24, 2014 11:08 PM, Joshua Montgomery josh1s4l...@gmail.com wrote:

So an Elasticsearch clusters I help run had an interesting issue last week
around mappings and I wanted to get the communities thoughts about how to
handle it.

*Issue:*
Our cluster one morning went into utter chaos for no apparent reason. We
had nodes dropping constantly (master and data type nodes) and lots of
network exceptions in a our log files. The cluster kept going red from all
the dropped nodes and the cluster was totally unresponsive to external
commands.

*Some Backgound:*
Our cluster is fairly open to our users, meaning they can index what ever
they want without needing approval (this may have to change based on what
happened). The content stored is usually generated from .Net objects and
serialized using the Netwonsoft json serializer.

*Cause:*
After 6hrs of investigation while trying to get our cluster stable, this
is what we found:

We had a new document type (around 30,000 documents) indexed into the
cluster over a 1 hour window containing the .Net equivalent of a dictionary
in json format. When a dictionary is serialized to json, it ends up with a
json object containing a list of properties and values. The current
behavior of Elasticsearch is to generate a mapping definition for each
field name in a json object. So when you serialize a dictionary, it means
every 'key' in the dictionary gets its own mapping definition. It turns out
this can lead to nasty consequences when indexed in Elasticsearch...

Essentially, every document contained its own list of unique keys which
resulted in Elasticsearch generating mapping definitions for all the keys.
We found this out by noticing that the json type with the dictionary
continuously kept having is mappings updated (based on the master node log
files). The continual updating of the mappings (which is part of the
overall state file) caused the master nodes to lock up on the updates,
effectively stopping all other cluster operations. The state file upon
further investigation was over 70MB large by the time we ended up stopping
the cluster. Stopping the cluster was the only way to stop updates to the
mappings. The large mapping file we suspect was one of the major reasons
for nodes dropping; connections would timeout during the large file copy
(i'm assuming the state is passed around the nodes in the cluster).

*Solution:*
As previously mentioned we had to stop the cluster. We then had to make
sure that all indexing operations were stopped. Upon restarting the cluster
we deleted all documents of the poisonous document type (which took a
while). This resulted is a much smaller state file and a stable cluster.

*Prevention:*
So this is my real question for the community, what is the correct action
for preventing this in the future (or does it already exist). We could
obviously start more closely reviewing what goes into our cluster, but
should there be a feature in Elasticsearch to prevent this (assuming it
doesn't already exist)? I'm assuming that there are a number of users who
have clusters where they don't review everything that goes into their
cluster. So would it make sense to have Elasticsearch provide some feature
to prevent this issue, which is the equivalent to a DOS attack on the
cluster?

Thanks for reading this and I look forward to your responses!

-Josh Montgomery

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0b-Q1y1152vA%3D%2BCYERGZxuk92iLDG3U-0L18q1oc1oxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Need some advice to build a log central.

2014-08-24 Thread vineeth mohan

Hello Sang ,

As this is a question answer forum , we highly recommend you to take a shot
yourself and post questions if you have hit a dead end.

Thanks
   Vineeth


On Mon, Aug 25, 2014 at 7:56 AM, Sang Dang zkid...@gmail.com wrote:

 Hi All,
 I am going to build a log central using ElasticSearch.
 I need some advice from anyone who have built it already.




  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5af88543-5806-4021-83a5-41abc5b2bed6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/5af88543-5806-4021-83a5-41abc5b2bed6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DFpaQo8togCSKFR5J6bsqMBw_8-39uNPPj_Q6H2ag%2Bow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Function Score not working with object type

2014-08-24 Thread vineeth mohan

Hello Pablo ,

Lucene ( the underlying library library on which ES is build upon) has only
key value concept and it does not keep object level information.
This means that on Lucene side , data would be stored as

fsot.testObjects : [ test1 , test2 ]

And there is not field names as fsot on lucene side.
This means that you need to give the field name as fsot.testObjects rather
than fsot.

Thanks
Vineeth

On Mon, Aug 25, 2014 at 7:57 AM, Pablo Musa pablitom...@gmail.com wrote:

Hey guys,
I am trying to use the function score but I am getting the following error:

ElasticsearchIllegalArgumentException[No field found for [fsot] in mapping
with types [tst]];

I have used function score before and it worked like a charm so I started
digging what was wrong. I found out that it does not work with object type.

Am I doing something wrong? What am I missing here?

The following gist contains an example and the error I received.
https://gist.github.com/pmusa/ef9a02210d736ee020d9

Thanks in advance,
Pablo Musa

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5miTeN1xCHHz%2ByOLxoOX%2BwkOq8TAFgXFYrx6ziaJa%3DJ%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Function Score not working with object type

2014-08-24 Thread pablitomusa

It worked. Thank you very much.

* copying the final code for future referece:

POST test/tst/_search
{
  query: {
function_score: {
  boost_mode: replace, 
  query: {
filtered: {
  query: {
match_all: {}
  },
  filter: {
exists: {
  field: fsot
}
  }
}
  },
  functions: [
{
  script_score: {
script: if ( doc.get('fsot.testobj') == null ) 0; else 1;
  }
}
  ]
}
  }
}



On Monday, August 25, 2014 12:33:47 AM UTC-3, vineeth mohan wrote:

 Hello Pablo , 

 Lucene ( the underlying library library on which ES is build upon) has 
 only key value concept and it does not keep object level information.
 This means that on Lucene side , data would be stored as 

 fsot.testObjects : [ test1 , test2 ] 

 And there is not field names as fsot on lucene side.
 This means that you need to give the field name as fsot.testObjects rather 
 than fsot.

 Thanks
  Vineeth


 On Mon, Aug 25, 2014 at 7:57 AM, Pablo Musa pabli...@gmail.com 
 javascript: wrote:

 Hey guys,
 I am trying to use the function score but I am getting the following 
 error:

 ElasticsearchIllegalArgumentException[No field found for [fsot] in 
 mapping with types [tst]];

 I have used function score before and it worked like a charm so I started 
 digging what was wrong. I found out that it does not work with object type.

 Am I doing something wrong? What am I missing here?

 The following gist contains an example and the error I received.
 https://gist.github.com/pmusa/ef9a02210d736ee020d9

 Thanks in advance,
 Pablo Musa

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com
  
 https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf051fbb-a860-4898-864b-fedb54c8500c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Error running ES DSL in hadoop mapreduce

2014-08-24 Thread Sona Samad

Hi Adrien,
 
My elasticsearch version is :  elasticsearch-1.2.1 
 
The Maven dependency for hadoop:
 
dependency
  groupIdorg.elasticsearch/groupId
  artifactIdelasticsearch-hadoop-mr/artifactId
  version2.0.1/version
/dependency 
 
 
The full stack trace is given below:
 
[2014-08-25 09:31:58,892][DEBUG][action.search.type   ] [Thane Ector] 
[mr][4], node[1ZbXSvkKQC-kDvgMXuC8iQ], [P], s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@6ed78f6d]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][4]: 
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed 
[Failed to execute main query]
 at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
 at 
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
 at 
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
 at 
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
 at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 97
 at 
org.elasticsearch.common.util.BigArrays$IntArrayWrapper.set(BigArrays.java:185)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus$Hashset.values(HyperLogLogPlusPlus.java:499)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.upgradeToHll(HyperLogLogPlusPlus.java:307)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLcEncoded(HyperLogLogPlusPlus.java:245)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLc(HyperLogLogPlusPlus.java:239)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:231)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:204)
 at 
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.collect(CardinalityAggregator.java:118)
 at 
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
 at 
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
 at 
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collect(GlobalOrdinalsStringTermsAggregator.java:98)
 at 
org.elasticsearch.search.aggregations.AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:157)
 at 
org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:60)
 at 
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
 at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
 at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
 at 
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
 at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
 ... 7 more
[2014-08-25 09:31:58,894][DEBUG][action.search.type   ] [Thane Ector] 
All shards failed for phase: [init_scan]
 
Thanks,
Sona
 

On Friday, August 22, 2014 5:07:33 PM UTC+5:30, Sona Samad wrote:

 Hi,

 I was trying to run the below query from hadoop mapreduce:

 {
  aggs: {
 group_by_body_part: {
   terms: {
 field: body_part,
 size: 5,
 order : { examcount : desc }
 },
   aggs: {
 examcount: {
   cardinality: {
 field: ExamRowKey
   }
 }
   }
 }
   }
 }

 The query is returning more than 5 records, even when the size is given as 
 5. 
 Also, the result was not aggregated, rather it returns the entire record 
 from the index as value to mapper.

 Also the following error is logged:

 [2014-08-22 16:06:21,459][DEBUG][action.search.type   ] [Algrim the 
 Strong] All shards failed for phase: [init_scan]
 [2014-08-22 16:26:38,875][DEBUG][action.search.type   ] [Algrim the 
 Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to 
 execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
 org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]: 
 query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed 
 [Failed to execute main query]
 at 
 org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
 at

Re: Elasticsearch shuts down for no reason

Re: Elasticsearch shuts down for no reason

Re: Elasticsearch shuts down for no reason

date_histogram facet float possible overflow

Re: Json Data not getting parsed when sent to Elasticsearch

Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

Re: Json Data not getting parsed when sent to Elasticsearch

Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

indices.memory.index_buffer_size

Re: Topics/Entities with relevancy scores and searching

Re: Boost the first word in a multi-word query

Re: One large index vs. many smaller indexes

Re: indices.memory.index_buffer_size

Re: What fields does ElasticSearch map by default?

Elasticsearch Function Score not working with object type

Re: DOS attack Elasticsearch with Mappings

Re: Need some advice to build a log central.

Re: Elasticsearch Function Score not working with object type

Re: Elasticsearch Function Score not working with object type

Re: Error running ES DSL in hadoop mapreduce

20 matches

Site Navigation

Mail list logo

Footer information