Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Radu Gheorghe
Hi Paresh,

You're welcome. I'm this

glad I nailed it!

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Jan 9, 2015 at 9:25 AM, Paresh Behede  wrote:

> Thank you so much Rodu...solution worked for me...
>
> Regards,
> Paresh B.
>
> On Thursday, 8 January 2015 21:11:47 UTC+5:30, Radu Gheorghe wrote:
>>
>> Thanks, David! I had no idea it works until... about one hour ago :)
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Thu, Jan 8, 2015 at 4:01 PM, David Pilato  wrote:
>>
>>> Very nice Radu. I love this trick. :)
>>>
>>> --
>>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
>>> *
>>> @dadoonet  | @elasticsearchfr
>>>  | @scrutmydocs
>>> 
>>>
>>>
>>>
>>> Le 8 janv. 2015 à 14:43, Radu Gheorghe  a écrit
>>> :
>>>
>>> Hi Paresh,
>>>
>>> If you want to sort on the field, I think it has to be the same type. So
>>> if you make everything a double, it should work for all numeric fields. To
>>> do that, you can use dynamic templates
>>> .
>>> For example if you have this:
>>>
>>>   "mappings" : {
>>> "_default_" : {
>>>"dynamic_templates" : [ {
>>>  "long_to_float" : {
>>>"match" : "*",
>>>"match_mapping_type" : "long",
>>>"mapping" : {
>>>  "type" : "float"
>>>}
>>>  }
>>>} ]
>>>  }
>>>   }
>>>
>>> And add a new field with value=32, the field would be mapped as float
>>> instead of long.
>>>
>>> Best regards,
>>> Radu
>>> --
>>> Performance Monitoring * Log Analytics * Search Analytics
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>> On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede 
>>> wrote:
>>>
 Hi,

 I have requirement of storing document in elastic search which will
 have dynamic fields + those fields could have different data types 
 values...

 For e.g.,
 Document 1 could have age field with value = 32, so when I would insert
 1st document in ES my index mapping will get created and age will be mapped
 to Integer/Long

 Now if I get age = 32.5 in another document ES will throw me exception
 of data type mismatch...

 Can you suggest what can I do to handle such scenario?

 As workaround we are creating different fields for different data types
 like age.long / age.double but this also won't work if I have to do sorting
 over age field...

 Kindly suggest...

 Thanks in advance,
 Paresh Behede



 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_
>>> %2BPsCYxCHu7_ErZVw%40mail.gmail.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/E0768DFA-EF17-46F2-B488-5EC29A60E37D%40pilato.fr
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group

Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Paresh Behede
Thank you so much Rodu...solution worked for me...

Regards,
Paresh B.

On Thursday, 8 January 2015 21:11:47 UTC+5:30, Radu Gheorghe wrote:
>
> Thanks, David! I had no idea it works until... about one hour ago :)
>
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
> On Thu, Jan 8, 2015 at 4:01 PM, David Pilato  > wrote:
>
>> Very nice Radu. I love this trick. :)
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> *
>> @dadoonet  | @elasticsearchfr 
>>  | @scrutmydocs 
>> 
>>
>>
>>  
>> Le 8 janv. 2015 à 14:43, Radu Gheorghe > > a écrit :
>>
>> Hi Paresh,
>>
>> If you want to sort on the field, I think it has to be the same type. So 
>> if you make everything a double, it should work for all numeric fields. To 
>> do that, you can use dynamic templates 
>> .
>>  
>> For example if you have this:
>>
>>   "mappings" : {
>> "_default_" : {
>>"dynamic_templates" : [ {
>>  "long_to_float" : {
>>"match" : "*",
>>"match_mapping_type" : "long",
>>"mapping" : {
>>  "type" : "float"
>>}
>>  }
>>} ]
>>  }
>>   }
>>
>> And add a new field with value=32, the field would be mapped as float 
>> instead of long.
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede > > wrote:
>>
>>> Hi,
>>>
>>> I have requirement of storing document in elastic search which will have 
>>> dynamic fields + those fields could have different data types values...
>>>
>>> For e.g., 
>>> Document 1 could have age field with value = 32, so when I would insert 
>>> 1st document in ES my index mapping will get created and age will be mapped 
>>> to Integer/Long
>>>
>>> Now if I get age = 32.5 in another document ES will throw me exception 
>>> of data type mismatch...
>>>
>>> Can you suggest what can I do to handle such scenario?
>>>
>>> As workaround we are creating different fields for different data types 
>>> like age.long / age.double but this also won't work if I have to do sorting 
>>> over age field...
>>>
>>> Kindly suggest...
>>>
>>> Thanks in advance,
>>> Paresh Behede
>>>
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_%2BPsCYxCHu7_ErZVw%40mail.gmail.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/E0768DFA-EF17-46F2-B488-5EC29A60E37D%40pilato.fr
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f740c21-3a2f-4794-9ee1-fb0c4088b48e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz
Now that I am into the real wold scenario, it gets a bit tricker - I have 
nested objects (keys).
I have to test the existence of the key in the Groovy script to avoid 
parsing errors on insert.

How do you access a nested object in groovy?  and test for the existence of 
a nested object key?
such as this example:

curl -XPOST 'http://'$NODE':9200/'$INDEX_NAME'/post' -d '{
  "titles": ["title 1", "title 2", "title 3", "title 4"],
  "raw" : {
"links" : ["http://bit.ly/abc";, "http://bit.ly/abc";, 
"http://bit.ly/def";, "http://bit.ly/ghi";]
  }
}'

This doesn't seem to work (form what I can tell it never finds the key 
raw.links even when it does exist)

  "script" : "if (ctx._source.containsKey('raw.links') ) 
{ctx._source.links_url_count = ctx._source['raw.links''].size() } else { 
ctx._source.links_url_count = 0 }"

Simple keys work though like ctx._source.containsKey('title') 

On Thursday, January 8, 2015 at 9:59:56 PM UTC-8, Nikolas Everett wrote:
>
> Transform never saves to source. You have to transform on the application 
> side for that. It was designed for times when you wanted to index something 
> like this that would just take up extra space in the source document. I 
> imagine you could use a script field on the query if you need the result to 
> contain the count. Or just count it on the result side. 
>
> Nik
> On Jan 9, 2015 12:43 AM, "Jeff Steinmetz"  > wrote:
>
>> Transform worked well.  Nice.
>>
>> Curious how to get it to save to source?  Tried this below, no go.  (I 
>> can however do range queries agains title_count, so transform was indexed 
>> and works well)
>>
>> "transform" : {
>>   "script" : "ctx._source['\'title_count\''] = 
>> ctx._source['\'titles\''].size()",
>>   "lang": "groovy"
>> },
>>  "properties": {
>>  "titles": { "type": "string", "index": "not_analyzed" },
>>  "title_count" : { "type": "integer", "store": "yes" }
>>}
>> }'
>>
>>
>> On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>>>
>>> Source is going to be pretty sloe, yeah. If its a one off then its 
>>> probably fine but if you do it a lot probably best to index the count. 
>>> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz"  wrote:
>>>
 Thank you, that worked.

 I was curious about the speed, is running a script using _source slower 
 that doc[] ?

 Totally understand a dynamic script is slower regardless of _source vs 
 doc[].

 Makes sense that having a count transformed up front during index to 
 create a materialized value would certainly be much faster.


 On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>
>
>
> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz  
> wrote:
>
> Is there a better way to do this?
>>
>> Please see this gist (or even better yet, run the script locally see 
>> the issue).
>>
>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>
>> You must have scripting enabled in your elasticsearch config for this 
>> to work.
>>
>> This was originally based on some comments I found here:
>> http://stackoverflow.com/questions/17314123/search-by-size-
>> of-object-type-field-elastic-search
>>
>> We would like to use a filtered query to only include documents that 
>> a small count of items in the list [aka array], filtering where 
>>  values.size() < 10
>>
>> "script": "doc['titles'].values.size() < 10"
>>
>> Turns out the values.size() actually either counts tokenized 
>> (analyzed) words, or if the mapping turns off analysis, it still counts 
>> incorrectly if there are duplicates.
>> If analyze is not turned off, it counts tokenized words, not the 
>> number of elements in the list.
>> If analyze is turned off for a given field, it improves, but 
>> duplicates are missed.
>>
>> For example, This comes back as size == 2
>> "titles": ["one", "duplicate", "duplicate"]
>> This comes back as size == 3, should be 4
>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "
>> http://bit.ly/def";, "http://bit.ly/ghi";]
>>
>> Is this a bug, is there a better way, or is this just something that 
>> we don't understand about groovy and values.size()?
>>
>>
>>
> I think that's just the way doc[] works.  Try (but don't actually 
> deploy) _source['titles'].size() < 10.  That should do what you expect.  
> Don't deploy that because its too slow.  Try indexing the size and 
> filtering on it.  You can use a transform to add the size of the array as 
> an integer field and just filter on it using a range filter.  That'd 
> probably be the fastest option.
>
> Nik
>
  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, se

How to get only the required results from aggregation

2015-01-08 Thread cto@TCS
Hi,

I have an input JSON of the format
*{*
*"shopName": "Shop01",*
*"address": "xyz",*
*"rackId": "ac015",*
*"rackProductList": [*
*{*
*"name": "book",*
*"price": 111,*
*"weight": 123*
*},*
*{*
*"name": "notebook",*
*"price": 133,*
*"weight": 123*
*},*
*{*
*"name": "pencil-box",*
*"price": 131,*
*"weight": 123*
*}*
*]*
*}*

"rackProductList" is a nested object.
Now, I want to get the *max price* of *books* across all racks.  

I am using the search query 
*POST  /_search*

*{*
*"size": 0,*
*"aggs": {*
*"attribute": {*
*"nested": {*
*"path": "rackProductList"*
*},*
*"aggs": {*
*"group_by_name": {*
*"terms": {*
*"field": "rackProductList.name"*
*},*
*"aggs": {*
*"max_value": {*
*"max": {*
*"field": "rackProductList.price"*
*}*
*}*
*}*
*}*
*}*
*}*
*}*
*}*

This query is returning the max-price of all the items.
How can get only the max price value for books???
Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58b33bb2-58a0-47b5-9a19-25f78b45590f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Nikolas Everett
Transform never saves to source. You have to transform on the application
side for that. It was designed for times when you wanted to index something
like this that would just take up extra space in the source document. I
imagine you could use a script field on the query if you need the result to
contain the count. Or just count it on the result side.

Nik
On Jan 9, 2015 12:43 AM, "Jeff Steinmetz" 
wrote:

> Transform worked well.  Nice.
>
> Curious how to get it to save to source?  Tried this below, no go.  (I can
> however do range queries agains title_count, so transform was indexed and
> works well)
>
> "transform" : {
>   "script" : "ctx._source['\'title_count\''] =
> ctx._source['\'titles\''].size()",
>   "lang": "groovy"
> },
>  "properties": {
>  "titles": { "type": "string", "index": "not_analyzed" },
>  "title_count" : { "type": "integer", "store": "yes" }
>}
> }'
>
>
> On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>>
>> Source is going to be pretty sloe, yeah. If its a one off then its
>> probably fine but if you do it a lot probably best to index the count.
>> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz"  wrote:
>>
>>> Thank you, that worked.
>>>
>>> I was curious about the speed, is running a script using _source slower
>>> that doc[] ?
>>>
>>> Totally understand a dynamic script is slower regardless of _source vs
>>> doc[].
>>>
>>> Makes sense that having a count transformed up front during index to
>>> create a materialized value would certainly be much faster.
>>>
>>>
>>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:



 On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz 
 wrote:

 Is there a better way to do this?
>
> Please see this gist (or even better yet, run the script locally see
> the issue).
>
> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>
> You must have scripting enabled in your elasticsearch config for this
> to work.
>
> This was originally based on some comments I found here:
> http://stackoverflow.com/questions/17314123/search-by-size-
> of-object-type-field-elastic-search
>
> We would like to use a filtered query to only include documents that a
> small count of items in the list [aka array], filtering where
>  values.size() < 10
>
> "script": "doc['titles'].values.size() < 10"
>
> Turns out the values.size() actually either counts tokenized
> (analyzed) words, or if the mapping turns off analysis, it still counts
> incorrectly if there are duplicates.
> If analyze is not turned off, it counts tokenized words, not the
> number of elements in the list.
> If analyze is turned off for a given field, it improves, but
> duplicates are missed.
>
> For example, This comes back as size == 2
> "titles": ["one", "duplicate", "duplicate"]
> This comes back as size == 3, should be 4
> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "
> http://bit.ly/def";, "http://bit.ly/ghi";]
>
> Is this a bug, is there a better way, or is this just something that
> we don't understand about groovy and values.size()?
>
>
>
 I think that's just the way doc[] works.  Try (but don't actually
 deploy) _source['titles'].size() < 10.  That should do what you expect.
 Don't deploy that because its too slow.  Try indexing the size and
 filtering on it.  You can use a transform to add the size of the array as
 an integer field and just filter on it using a range filter.  That'd
 probably be the fastest option.

 Nik

>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsea

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz
Transform worked well.  Nice.

Curious how to get it to save to source?  Tried this below, no go.  (I can 
however do range queries agains title_count, so transform was indexed and 
works well)

"transform" : {
  "script" : "ctx._source['\'title_count\''] = 
ctx._source['\'titles\''].size()",
  "lang": "groovy"
},
 "properties": {
 "titles": { "type": "string", "index": "not_analyzed" },
 "title_count" : { "type": "integer", "store": "yes" }
   }
}'


On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>
> Source is going to be pretty sloe, yeah. If its a one off then its 
> probably fine but if you do it a lot probably best to index the count. 
> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz"  > wrote:
>
>> Thank you, that worked.
>>
>> I was curious about the speed, is running a script using _source slower 
>> that doc[] ?
>>
>> Totally understand a dynamic script is slower regardless of _source vs 
>> doc[].
>>
>> Makes sense that having a count transformed up front during index to 
>> create a materialized value would certainly be much faster.
>>
>>
>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>>
>>>
>>>
>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz  
>>> wrote:
>>>
>>> Is there a better way to do this?

 Please see this gist (or even better yet, run the script locally see 
 the issue).

 https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae

 You must have scripting enabled in your elasticsearch config for this 
 to work.

 This was originally based on some comments I found here:
 http://stackoverflow.com/questions/17314123/search-by-
 size-of-object-type-field-elastic-search

 We would like to use a filtered query to only include documents that a 
 small count of items in the list [aka array], filtering where 
  values.size() < 10

 "script": "doc['titles'].values.size() < 10"

 Turns out the values.size() actually either counts tokenized (analyzed) 
 words, or if the mapping turns off analysis, it still counts incorrectly 
 if 
 there are duplicates.
 If analyze is not turned off, it counts tokenized words, not the number 
 of elements in the list.
 If analyze is turned off for a given field, it improves, but duplicates 
 are missed.

 For example, This comes back as size == 2
 "titles": ["one", "duplicate", "duplicate"]
 This comes back as size == 3, should be 4
 "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
 "http://bit.ly/ghi";]

 Is this a bug, is there a better way, or is this just something that we 
 don't understand about groovy and values.size()?



>>> I think that's just the way doc[] works.  Try (but don't actually 
>>> deploy) _source['titles'].size() < 10.  That should do what you expect.  
>>> Don't deploy that because its too slow.  Try indexing the size and 
>>> filtering on it.  You can use a transform to add the size of the array as 
>>> an integer field and just filter on it using a range filter.  That'd 
>>> probably be the fastest option.
>>>
>>> Nik
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Nikolas Everett
Source is going to be pretty sloe, yeah. If its a one off then its probably
fine but if you do it a lot probably best to index the count.
On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" 
wrote:

> Thank you, that worked.
>
> I was curious about the speed, is running a script using _source slower
> that doc[] ?
>
> Totally understand a dynamic script is slower regardless of _source vs
> doc[].
>
> Makes sense that having a count transformed up front during index to
> create a materialized value would certainly be much faster.
>
>
> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>
>>
>>
>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz 
>> wrote:
>>
>> Is there a better way to do this?
>>>
>>> Please see this gist (or even better yet, run the script locally see the
>>> issue).
>>>
>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>
>>> You must have scripting enabled in your elasticsearch config for this to
>>> work.
>>>
>>> This was originally based on some comments I found here:
>>> http://stackoverflow.com/questions/17314123/search-by-
>>> size-of-object-type-field-elastic-search
>>>
>>> We would like to use a filtered query to only include documents that a
>>> small count of items in the list [aka array], filtering where
>>>  values.size() < 10
>>>
>>> "script": "doc['titles'].values.size() < 10"
>>>
>>> Turns out the values.size() actually either counts tokenized (analyzed)
>>> words, or if the mapping turns off analysis, it still counts incorrectly if
>>> there are duplicates.
>>> If analyze is not turned off, it counts tokenized words, not the number
>>> of elements in the list.
>>> If analyze is turned off for a given field, it improves, but duplicates
>>> are missed.
>>>
>>> For example, This comes back as size == 2
>>> "titles": ["one", "duplicate", "duplicate"]
>>> This comes back as size == 3, should be 4
>>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";,
>>> "http://bit.ly/ghi";]
>>>
>>> Is this a bug, is there a better way, or is this just something that we
>>> don't understand about groovy and values.size()?
>>>
>>>
>>>
>> I think that's just the way doc[] works.  Try (but don't actually deploy)
>> _source['titles'].size() < 10.  That should do what you expect.  Don't
>> deploy that because its too slow.  Try indexing the size and filtering on
>> it.  You can use a transform to add the size of the array as an integer
>> field and just filter on it using a range filter.  That'd probably be the
>> fastest option.
>>
>> Nik
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd35LG%3Dki2jMigsfgwrojXVBTCkJH784wu7GbEcXvu3tRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz
Thank you, that worked.

I was curious about the speed, is running a script using _source slower 
that doc[] ?

Totally understand a dynamic script is slower regardless of _source vs 
doc[].

Makes sense that having a count transformed up front during index to create 
a materialized value would certainly be much faster.


On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>
>
>
> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz  > wrote:
>
> Is there a better way to do this?
>>
>> Please see this gist (or even better yet, run the script locally see the 
>> issue).
>>
>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>
>> You must have scripting enabled in your elasticsearch config for this to 
>> work.
>>
>> This was originally based on some comments I found here:
>>
>> http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search
>>
>> We would like to use a filtered query to only include documents that a 
>> small count of items in the list [aka array], filtering where 
>>  values.size() < 10
>>
>> "script": "doc['titles'].values.size() < 10"
>>
>> Turns out the values.size() actually either counts tokenized (analyzed) 
>> words, or if the mapping turns off analysis, it still counts incorrectly if 
>> there are duplicates.
>> If analyze is not turned off, it counts tokenized words, not the number 
>> of elements in the list.
>> If analyze is turned off for a given field, it improves, but duplicates 
>> are missed.
>>
>> For example, This comes back as size == 2
>> "titles": ["one", "duplicate", "duplicate"]
>> This comes back as size == 3, should be 4
>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
>> "http://bit.ly/ghi";]
>>
>> Is this a bug, is there a better way, or is this just something that we 
>> don't understand about groovy and values.size()?
>>
>>
>>
> I think that's just the way doc[] works.  Try (but don't actually deploy) 
> _source['titles'].size() < 10.  That should do what you expect.  Don't 
> deploy that because its too slow.  Try indexing the size and filtering on 
> it.  You can use a transform to add the size of the array as an integer 
> field and just filter on it using a range filter.  That'd probably be the 
> fastest option.
>
> Nik
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: No failover if number_of_replicas exceeds number of nodes?

2015-01-08 Thread Mathew D
I've logged this as 
https://github.com/elasticsearch/elasticsearch/issues/9213.  Will mark this 
thread as complete in favour of the github issue.


On Friday, January 9, 2015 at 12:41:26 PM UTC+13, Mark Walkom wrote:
>
> It seems highly unusual that this is occurring. I'd recommend that you 
> open a github issue with details and see what the devs think.
>
> On 8 January 2015 at 09:38, Mathew D > 
> wrote:
>
>> Good point... actually it looks like the 'head' plugin is misreporting a 
>> yellow status when replicas=2:
>>
>>
>>
>> 
>> Because when I access the _cluster/health endpoint, a status of red is 
>> returned:
>>
>> {
>>   "cluster_name": "KR_elasticsearch_PROD",
>>   "status": "red",
>>   "timed_out": false,
>>   "number_of_nodes": 1,
>>   "number_of_data_nodes": 1,
>>   "active_primary_shards": 0,
>>   "active_shards": 0,
>>   "relocating_shards": 0,
>>   "initializing_shards": 0,
>>   "unassigned_shards": 75
>> }
>>
>> So you're correct that the cluster cannot be yellow with primaries 
>> unassigned.  However would still be good to know why ES would refuse to 
>> allocate primary shards if the number of replicas exceeds the number of 
>> nodes.
>>
>> Cheers,
>> Mat
>>
>>
>> On Thursday, January 8, 2015 10:03:51 AM UTC+13, Mark Walkom wrote:
>>>
>>> A cluster cannot be yellow if any primaries are unassigned. Are you sure 
>>> it's yellow before you set replica's to 0?
>>>
>>> On 8 January 2015 at 06:52, Mathew D  wrote:
>>>
 I understand there's no point assigning primaries *and* replicas to a 
 single node, but in my case ES won't even allocate a primary (until I 
 reduce the number of replicas)


 On Wednesday, January 7, 2015 4:58:58 PM UTC+13, Mark Walkom wrote:
>
> It's not recommended to run an Elasticsearch cluster across 
> geographically dispersed locations.
>
> You cannot assign both primaries and replicas to a single node, it 
> defeats the purpose! So it's as design.
>
> On 7 January 2015 at 14:08, Mathew D  wrote:
>
>> Hi all,
>>
>> I've encountered some unexpected behaviour during my DR testing which 
>> I'm trying to explain.
>>
>> I have a 3-node geographically-separated cluster with the following 
>> settings:
>>
>> - index.number_of_shards=5
>> - index.number_of_replicas=2
>> - discovery.zen.minimum_master_nodes: 2
>>
>> I use number_of_replicas=2 for durability, so that each node will 
>> contain a full set to data (meaning I can lose 2 of my 3 nodes without 
>> losing any data).
>>
>> However I am finding that if I shut down 2 nodes, after adjusting 
>> minimum_master_nodes on the remaining node to 1 and restarting that 
>> node, 
>> the cluster stays yellow with all shards unassigned.  They remain in the 
>> unassigned state until I manually reduce number_of_replicas to down 1 or 
>> 0.  Once number_of_replicas <= number of nodes, the shards reassign and 
>> the 
>> cluster goes green.  Just wondering if this behaviour is as designed?
>>
>> Regards,
>> Mat
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google 
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/abb0e7cd-c11a-4914-a98f-70d68c1d17be%40goo
>> glegroups.com 
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/74ae6254-e4c5-4504-8bee-cd337666bea4%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/fbbf2012-5d5f-42f4-b388-762a997b8fc9%40googlegroups.com
>>  
>> 

corruption when indexing large number of documents (4 billion+)

2015-01-08 Thread Darshat Shah
Hi, 
We have a 98 node cluster of ES with each node 32GB RAM. 16GB is reserved 
for ES via config file. The index has 98 shards with 2 replicas. 

On this cluster we are loading a large number of documents (when done it 
would be about 10 billion). About 40million documents are generated per 
hour and we are pre-loading several days worth of documents to prototype 
how ES will scale, and its query performance. 

Right now we are facing problems getting data pre-loaded. Indexing is 
turned off. We use NEST client, with batch size of 10k. To speed up data 
load, we distribute the hourly data to each of the 98 nodes to insert in 
parallel. This worked ok for a few hours till we got 4.5B documents in the 
cluster. 

After that the cluster state went to red. The outstanding tasks CAT API 
shows errors like below. CPU/Disk/Memory seems to be fine on the nodes. 

Why are we getting these errors and is there a best practice? any help 
greatly appreciated since this blocks prototyping ES for our use case. 

thanks 
Darshat 

Sample errors: 

source   : shard-failed ([agora_v1][24], 
   node[00ihc1ToRiqMDJ1lou1Sig], [R], s[INITIALIZING]), 
   reason [Failed to start shard, message 
   [RecoveryFailedException[[agora_v1][24]: Recovery 
   failed from [Shingen 
Harada][RDAwqX9yRgud9f7YtZAJPg][CH1 
   SCH060051438][inet[/10.46.153.84:9300]] into 
[Elfqueen][ 
  
 00ihc1ToRiqMDJ1lou1Sig][CH1SCH050053435][inet[/10.46.182 
   .106:9300]]]; nested: 
RemoteTransportException[[Shingen 
  
 Harada][inet[/10.46.153.84:9300]][internal:index/shard/r 
   ecovery/start_recovery]]; nested: 
   RecoveryEngineException[[agora_v1][24] Phase[1] 
   Execution failed]; nested: 
   RecoverFilesRecoveryException[[agora_v1][24] Failed 
to 
   transfer [0] files with total size of [0b]]; nested: 
NoS 
  
 uchFileException[D:\app\ES.ElasticSearch_v010\elasticsea 
  
 rch-1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1 
   \24\index\segments_6r]; ]] 


AND 

source   : shard-failed ([agora_v1][95], 
   node[PUsHFCStRaecPA6MuvJV9g], [P], s[INITIALIZING]), 
   reason [Failed to start shard, message 
   [IndexShardGatewayRecoveryException[[agora_v1][95] 
   failed to fetch index version after copying it 
over]; 
   nested: CorruptIndexException[[agora_v1][95] 
   Preexisting corrupted index 
   [corrupted_1wegvS7BSKSbOYQkX9zJSw] caused by: 
   CorruptIndexException[Read past EOF while reading 
   segment infos] 
   EOFException[read past EOF: 
MMapIndexInput(path="D:\ 
  
 app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el 
  
 asticsearch\nodes\0\indices\agora_v1\95\index\segments_1 
   1j")] 
   org.apache.lucene.index.CorruptIndexException: Read 
   past EOF while reading segment infos 
   at 
org.elasticsearch.index.store.Store.readSegmentsI 
   nfo(Store.java:127) 
   at 
org.elasticsearch.index.store.Store.access$400(St 
   ore.java:80) 
   at 
org.elasticsearch.index.store.Store$MetadataSnaps 
   hot.buildMetadata(Store.java:575) 
---snip more stack trace-  


  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0f24b939-2cba-41a9-8de8-49565f77e567%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Nikolas Everett
On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz 
wrote:

Is there a better way to do this?
>
> Please see this gist (or even better yet, run the script locally see the
> issue).
>
> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>
> You must have scripting enabled in your elasticsearch config for this to
> work.
>
> This was originally based on some comments I found here:
>
> http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search
>
> We would like to use a filtered query to only include documents that a
> small count of items in the list [aka array], filtering where
>  values.size() < 10
>
> "script": "doc['titles'].values.size() < 10"
>
> Turns out the values.size() actually either counts tokenized (analyzed)
> words, or if the mapping turns off analysis, it still counts incorrectly if
> there are duplicates.
> If analyze is not turned off, it counts tokenized words, not the number of
> elements in the list.
> If analyze is turned off for a given field, it improves, but duplicates
> are missed.
>
> For example, This comes back as size == 2
> "titles": ["one", "duplicate", "duplicate"]
> This comes back as size == 3, should be 4
> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";,
> "http://bit.ly/ghi";]
>
> Is this a bug, is there a better way, or is this just something that we
> don't understand about groovy and values.size()?
>
>
>
I think that's just the way doc[] works.  Try (but don't actually deploy)
_source['titles'].size() < 10.  That should do what you expect.  Don't
deploy that because its too slow.  Try indexing the size and filtering on
it.  You can use a transform to add the size of the array as an integer
field and just filter on it using a range filter.  That'd probably be the
fastest option.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2d-KtOdV13trjnp3si_7%2B%2BAnOd%2BTTeTN75jkBuMsywyQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


inspect contents of filter cache

2015-01-08 Thread Srinivasan Ramaswamy
I am trying to inspect the contents of the filter cache to debug some query 
performance issue. Is there any way to look at the contents of filter cache 
in elasticsearch ?

Thanks
Srini

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14a7de7d-4392-43df-9f45-b5691307c513%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz
Is there a better way to do this?

Please see this gist (or even better yet, run the script locally see the 
issue).

https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae

You must have scripting enabled in your elasticsearch config for this to 
work.

This was originally based on some comments I found here:
http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search

We would like to use a filtered query to only include documents that a 
small count of items in the list [aka array], filtering where 
 values.size() < 10

"script": "doc['titles'].values.size() < 10"

Turns out the values.size() actually either counts tokenized (analyzed) 
words, or if the mapping turns off analysis, it still counts incorrectly if 
there are duplicates.
If analyze is not turned off, it counts tokenized words, not the number of 
elements in the list.
If analyze is turned off for a given field, it improves, but duplicates are 
missed.

For example, This comes back as size == 2
"titles": ["one", "duplicate", "duplicate"]
This comes back as size == 3, should be 4
"titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
"http://bit.ly/ghi";]

Is this a bug, is there a better way, or is this just something that we 
don't understand about groovy and values.size()?










-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5e88338-8c4f-4cb8-b6c4-d7f47b365175%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: No failover if number_of_replicas exceeds number of nodes?

2015-01-08 Thread Mark Walkom
It seems highly unusual that this is occurring. I'd recommend that you open
a github issue with details and see what the devs think.

On 8 January 2015 at 09:38, Mathew D  wrote:

> Good point... actually it looks like the 'head' plugin is misreporting a
> yellow status when replicas=2:
>
>
>
> 
> Because when I access the _cluster/health endpoint, a status of red is
> returned:
>
> {
>   "cluster_name": "KR_elasticsearch_PROD",
>   "status": "red",
>   "timed_out": false,
>   "number_of_nodes": 1,
>   "number_of_data_nodes": 1,
>   "active_primary_shards": 0,
>   "active_shards": 0,
>   "relocating_shards": 0,
>   "initializing_shards": 0,
>   "unassigned_shards": 75
> }
>
> So you're correct that the cluster cannot be yellow with primaries
> unassigned.  However would still be good to know why ES would refuse to
> allocate primary shards if the number of replicas exceeds the number of
> nodes.
>
> Cheers,
> Mat
>
>
> On Thursday, January 8, 2015 10:03:51 AM UTC+13, Mark Walkom wrote:
>>
>> A cluster cannot be yellow if any primaries are unassigned. Are you sure
>> it's yellow before you set replica's to 0?
>>
>> On 8 January 2015 at 06:52, Mathew D  wrote:
>>
>>> I understand there's no point assigning primaries *and* replicas to a
>>> single node, but in my case ES won't even allocate a primary (until I
>>> reduce the number of replicas)
>>>
>>>
>>> On Wednesday, January 7, 2015 4:58:58 PM UTC+13, Mark Walkom wrote:

 It's not recommended to run an Elasticsearch cluster across
 geographically dispersed locations.

 You cannot assign both primaries and replicas to a single node, it
 defeats the purpose! So it's as design.

 On 7 January 2015 at 14:08, Mathew D  wrote:

> Hi all,
>
> I've encountered some unexpected behaviour during my DR testing which
> I'm trying to explain.
>
> I have a 3-node geographically-separated cluster with the following
> settings:
>
> - index.number_of_shards=5
> - index.number_of_replicas=2
> - discovery.zen.minimum_master_nodes: 2
>
> I use number_of_replicas=2 for durability, so that each node will
> contain a full set to data (meaning I can lose 2 of my 3 nodes without
> losing any data).
>
> However I am finding that if I shut down 2 nodes, after adjusting
> minimum_master_nodes on the remaining node to 1 and restarting that node,
> the cluster stays yellow with all shards unassigned.  They remain in the
> unassigned state until I manually reduce number_of_replicas to down 1 or
> 0.  Once number_of_replicas <= number of nodes, the shards reassign and 
> the
> cluster goes green.  Just wondering if this behaviour is as designed?
>
> Regards,
> Mat
>
>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/abb0e7cd-c11a-4914-a98f-70d68c1d17be%40goo
> glegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/74ae6254-e4c5-4504-8bee-cd337666bea4%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/fbbf2012-5d5f-42f4-b388-762a997b8fc9%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroup

Re: How can I input data from java to ES in real time?

2015-01-08 Thread joergpra...@gmail.com
You have to try capacity planning for yourself, by adjusting configuration
of index / shards / nodes and resource control parameters like heap size,
how many documents your system can load. There is no fixed rule that can
estimate the maximum amount of data volume an ES cluster is capable of.
Note that you can always add nodes and create new indices, it is possible
to grow a single cluster into hundreds or thousands of nodes ( = machines).

For a rough number, you should estimate your retention data volume, maybe
you must keep the data for some days and you can drop it then. Then you
have to estimate the resources for search (quer load, filter caches etc.)
All these factors should be taken into consideration for the number of
nodes you may need.

Jörg

On Thu, Jan 8, 2015 at 2:15 PM, Marian Valero  wrote:

> Ok! thank you, and referent to cluster, because is so many data an
> everyday this increment and I can't have all data in only one machine, how
> many cluster I have to use?
>
> Thanks for all.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0886b197-b9e9-4cea-bce5-64f0a47b203d%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHMBfCKmrjvgWRwwJP6SsHQqw9Q3qmz93azDWOGJWay4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Upgraded node unable to join cluster while attempting cluster upgrade from 1.3.2 to 1.4.2

2015-01-08 Thread joergpra...@gmail.com
Hi Radu,

the deploy plugin is somewhat limited to "restartable" plugins as it can
only restart services of a plugin, i.e. lifecycle components with carefully
implemented doStart()/doStop() methods.

Most plugins come with modules that add REST endpoints, actions, parsers,
functions etc. as modules by the onModule...() mechanism, so they get
"baked" into the ES node as immutable objects before the node is going up.
This means, the deploy plugin can not change or undo this behavior. (Maybe
some evil hackery will do to a certain degree)

For my use case, I embed a Ratpack server and start my own HTTP web server
that does not use baked-in ES modules, just a restartable service. It is
really delightful to just use a single curl PUT command for rapid
prototyping a Ratpack web app embedded in ES without node restart.

A restartable plugin is:

https://github.com/jprante/elasticsearch-plugin-ratpack

Jörg



On Thu, Jan 8, 2015 at 5:24 PM, Radu Gheorghe 
wrote:

> Hello Ben,
>
> Maybe it works if you uninstall the plugin from one node at a time and do
> a rolling restart (sticking to 1.3.2), then do the upgrade with another
> rolling restart, then install the plugin back again with yet another
> rolling restart?
>
> I would understand if you said "no way I do 3 restarts!" :) But maybe this
> will help in future:
> https://github.com/jprante/elasticsearch-plugin-deploy
>
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
> On Wed, Jan 7, 2015 at 5:15 AM, Ben Berg  wrote:
>
>> Hello,
>> I am attempting to upgrade a 10 node cluster from 1.3.2 to 1.4.2 - I
>> upgraded the first node,removed and reinstalled latest versions of plugins
>> - two non-site plugins are river-twitter (version 2.4.1) and jdbc (version
>> 1.4.0.8 and also tried 1.4.0.7) - and when starting the node I see the
>> errors below in logs on upgraded node and node does not join cluster. If i
>> downgrade to 1.3.2, uninstall plugins and reinstall jdbc river plugin
>> version 1.3.0.4 the node properly joins the cluster.
>> Errors from logs:
>> [2015-01-06 21:29:39,970][INFO ][node ] [bi-es1]
>> version[1.4.2], pid[5910], build[927caff/2014-12-16T14:11:12Z]
>> [2015-01-06 21:29:39,971][INFO ][node ] [bi-es1]
>> initializing ...
>> [2015-01-06 21:29:40,009][INFO ][plugins  ] [bi-es1]
>> loaded [river-twitter, jdbc-1.4.0.7-a875ced], sites [head, kopf, bigdesk,
>> paramedic, HQ, whatson]
>> [2015-01-06 21:29:45,587][INFO ][node ] [bi-es1]
>> initialized
>> [2015-01-06 21:29:45,588][INFO ][node ] [bi-es1]
>> starting ...
>> [2015-01-06 21:29:45,960][INFO ][transport] [bi-es1]
>> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
>> 192.168.83.231:9300]}
>> [2015-01-06 21:29:45,982][INFO ][discovery] [bi-es1]
>> bi-cluster1/VyTzSKWnQAS9ZCk971fJEg
>> [2015-01-06 21:29:51,906][WARN ][transport.netty  ] [bi-es1]
>> Message not fully read (request) for [28805963] and action
>> [discovery/zen/join/validate], resetting
>> [2015-01-06 21:29:51,915][INFO ][discovery.zen] [bi-es1]
>> failed to send join request to master
>> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
>> reason 
>> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
>> nested: 
>> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
>> nested: ElasticsearchIllegalArgumentException[No custom index metadata
>> factory registered for type [rivers]]; ]
>> [2015-01-06 21:29:57,028][WARN ][transport.netty  ] [bi-es1]
>> Message not fully read (request) for [28807127] and action
>> [discovery/zen/join/validate], resetting
>> [2015-01-06 21:29:57,036][INFO ][discovery.zen] [bi-es1]
>> failed to send join request to master
>> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
>> reason 
>> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
>> nested: 
>> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
>> nested: ElasticsearchIllegalArgumentException[No custom index metadata
>> factory registered for type [rivers]]; ]
>> [2015-01-06 21:30:02,245][WARN ][transport.netty  ] [bi-es1]
>> Message not fully read (request) for [28808254] and action
>> [discovery/zen/join/validate], resetting
>> [2015-01-06 21:30:02,252][INFO ][discovery.zen] [bi-es1]
>> failed to send join request to master
>> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
>> reason 
>> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
>> nested: 
>> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
>> nested: Elasticsear

Re: Marvel issue with Elasticsearch 1.4.2 version setup using Oracle Java 1.8.0.25

2015-01-08 Thread ajay . bh111
Update : If Marvel deployed on data cluster itself  i.e. data not shipped 
to monitoring nodes, it works fine.
Thanks
Ajay


On Thursday, January 8, 2015 at 2:08:10 PM UTC-5, ajay@gmail.com wrote:
>
>
> I am trying to set up monitoring cluster of marvel in test setup using new 
> elasticsearch 1.4.2 and latest marvel (pulled with bin/plugin -i 
> elasticsearch/marvel/latest  command) . without marvel ES starts on 
> marvel nodes without any error . When ES nodes try to connect to monitoring 
> nodes following error is thrown on Marvel nodes : (java vesrion is 1.8.0.25 
> on all nodes). All nodes same OS Ubuntu 3.13.0-40-generic x64)
>
>
> [2015-01-08 18:51:05,993][WARN ][transport.netty  ] [
> *es-orn-mon-02] exception caught on transport layer [[id: 0xf67ca3df, 
> /10.236.54.121:52237  => /10.236.54.81:9300]], 
> closing connectionjava.io.StreamCorruptedException: invalid internal 
> transport message format, got (47,45,54,20)*
> at 
> org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:47)
> at 
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
> at 
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
> at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at 
> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
> at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
> at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> On ES cluster nodes following error is thrown : (cluster nodes are able to 
> connect on port 9300 of monitoring nodes)
>
> [2015-01-08 18:57:10,545][ERROR][marvel.agent.exporter] [es-orn-c-01] 
> failed to verify/upload the marvel template to [es-orn-mon-01:9300]:
> Unexpected end of file from server
> [2015-01-08 18:57:10,550][ERROR][marvel.agent.exporter] [es-orn-c-01] 
> failed to verify/upload the marvel template to [es-orn-mon-02:9300]:
> Unexpected end of file from server
> [2015-01-08 18:57:10,550][ERROR][marvel.agent.exporter] [es-orn-c-01] 
> could not connect to any configured elasticsearch instances: 
> [es-orn-mon-01:9300,es-orn-mon-02:9300]
>  
> Elasticsearch alone works fine on both monitoring and data cluster if 
> marvel plugin is removed.
>
> Thanks
> Ajay
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0778bae-f11f-4cb0-a287-77b67c350349%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Upgraded node unable to join cluster while attempting cluster upgrade from 1.3.2 to 1.4.2

2015-01-08 Thread Ben Berg
Ended up finding out that the jdbc plugin requires a full cluster shutdown 
for restart - https://github.com/jprante/elasticsearch-river-jdbc/issues/433
Going to be doing that in maintenance window tomorrow and will verify that 
worked.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60577717-944f-4b13-b66d-124cfb2e7b39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Concurrency problem when automatically creating an index

2015-01-08 Thread Tom
4

El jueves, 8 de enero de 2015 16:19:50 UTC-3, Jörg Prante escribió:
>
> How many nodes do you have in the cluster?
>
> Jörg
>
> On Thu, Jan 8, 2015 at 6:57 PM, Tom > 
> wrote:
>
>> Hi, we'd been using ES for a while now. Specifically version 0.90.3. A 
>> couple of months ago we decided to migrate to the latest version which was 
>> finally frozen to be 1.4.1. No data migration was necessary because we have 
>> a redundant MongoDB, but yesterday we enabled data writing to the new ES 
>> cluster. All was running smoothly when we noticed that at o'clock times 
>> there were bursts of four or five log messages of the following kinds:
>>
>> Error indexing None into index ind-analytics-2015.01.08. Total elapsed 
>> time: 1065 ms. 
>> org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: 
>> failed to process cluster event (acquire index lock) within 1s
>> at 
>> org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.run(MetaDataCreateIndexService.java:148)
>>  
>> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  
>> ~[na:1.7.0_17]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  
>> ~[na:1.7.0_17]
>> at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_17]
>>
>> [ForkJoinPool-2-worker-15] c.d.i.p.ActorScatterGatherStrategy - 
>> Scattering to failed in 1043ms 
>> org.elasticsearch.action.UnavailableShardsException: [ind-2015.01.08.00][0] 
>> Not enough active copies to meet write consistency of [QUORUM] (have 1, 
>> needed 2). Timeout: [1s], request: index {[ind-2015.01.08.00][search][...]}
>> at 
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.retryBecauseUnavailable(TransportShardReplicationOperationAction.java:784)
>>  
>> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
>> at 
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseFailureIfHaveNotEnoughActiveShardCopies(TransportShardReplicationOperationAction.java:776)
>>  
>> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
>> at 
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:507)
>>  
>> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
>> at 
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419)
>>  
>> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  
>> ~[na:1.7.0_17]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  
>> ~[na:1.7.0_17]
>> at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_17]
>>
>> This occurs at o'clock times because we write over hour-based indices. 
>> For example, all writes from 18:00:00 to 18:59:59 of 01/08 goes to 
>> ind-2015.01.08.18. At 19:00:00 all writes will go to ind-2015.01.08.19, and 
>> so on.
>>
>> With 0.90.3 version of ES, automatic index creation was working 
>> flawlessly (with no complaints) but the new version doesn't seem to handle 
>> that feature very well. It looks like, when all those concurrent writes 
>> competes to be the first to create the index, all but one fails. Of course 
>> we could just create such indices manually to avoid this situation 
>> altogether, but this would only be a workaround for a feature that 
>> previously worked.
>>
>> Also, we use ES through the native Java client and the configuration for 
>> all our indices is 
>>
>> settings = {
>>   number_of_shards = 5,
>>   number_of_replicas = 2
>> }
>>
>> Any ideas?
>>
>> Thanks in advance,
>> Tom;
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/4deefb09-bed1-499a-b9fc-3ed4d78fc4c0%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61a15d1a-02f6-484d-8ce7-862bfe427f17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Concurrency problem when automatically creating an index

2015-01-08 Thread joergpra...@gmail.com
How many nodes do you have in the cluster?

Jörg

On Thu, Jan 8, 2015 at 6:57 PM, Tom  wrote:

> Hi, we'd been using ES for a while now. Specifically version 0.90.3. A
> couple of months ago we decided to migrate to the latest version which was
> finally frozen to be 1.4.1. No data migration was necessary because we have
> a redundant MongoDB, but yesterday we enabled data writing to the new ES
> cluster. All was running smoothly when we noticed that at o'clock times
> there were bursts of four or five log messages of the following kinds:
>
> Error indexing None into index ind-analytics-2015.01.08. Total elapsed
> time: 1065 ms.
> org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:
> failed to process cluster event (acquire index lock) within 1s
> at
> org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.run(MetaDataCreateIndexService.java:148)
> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_17]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_17]
> at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_17]
>
> [ForkJoinPool-2-worker-15] c.d.i.p.ActorScatterGatherStrategy - Scattering
> to failed in 1043ms org.elasticsearch.action.UnavailableShardsException:
> [ind-2015.01.08.00][0] Not enough active copies to meet write consistency
> of [QUORUM] (have 1, needed 2). Timeout: [1s], request: index
> {[ind-2015.01.08.00][search][...]}
> at
> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.retryBecauseUnavailable(TransportShardReplicationOperationAction.java:784)
> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
> at
> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseFailureIfHaveNotEnoughActiveShardCopies(TransportShardReplicationOperationAction.java:776)
> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
> at
> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:507)
> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
> at
> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419)
> ~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_17]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_17]
> at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_17]
>
> This occurs at o'clock times because we write over hour-based indices. For
> example, all writes from 18:00:00 to 18:59:59 of 01/08 goes to
> ind-2015.01.08.18. At 19:00:00 all writes will go to ind-2015.01.08.19, and
> so on.
>
> With 0.90.3 version of ES, automatic index creation was working flawlessly
> (with no complaints) but the new version doesn't seem to handle that
> feature very well. It looks like, when all those concurrent writes competes
> to be the first to create the index, all but one fails. Of course we could
> just create such indices manually to avoid this situation altogether, but
> this would only be a workaround for a feature that previously worked.
>
> Also, we use ES through the native Java client and the configuration for
> all our indices is
>
> settings = {
>   number_of_shards = 5,
>   number_of_replicas = 2
> }
>
> Any ideas?
>
> Thanks in advance,
> Tom;
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4deefb09-bed1-499a-b9fc-3ed4d78fc4c0%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEL7i_6Nugjw4XDMAuAK9o6a14%2BDiah9wA37gCtpmf%3DwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Building Our Own Security for Inter-Node Communication in ES Cluster??

2015-01-08 Thread Tri Nguyen
Hi,

Where and what should I look at and for if I want to build and integrate 
security for inter-node communication in an ES cluster?

The security best practices and security plugins, except for Shield, seem 
to address only client access to ES. 

What should I do if I want to use SSL/TLS for encrypting inter-node 
communication in an ES cluster?

Any pointers or guidance would be appreciated. 

Regards

Tri M. Nguyen

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dfe9ce12-20cc-460b-afae-463602d090c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Marvel issue with Elasticsearch 1.4.2 version setup using Oracle Java 1.8.0.25

2015-01-08 Thread ajay . bh111

I am trying to set up monitoring cluster of marvel in test setup using new 
elasticsearch 1.4.2 and latest marvel (pulled with bin/plugin -i 
elasticsearch/marvel/latest  command) . without marvel ES starts on marvel 
nodes without any error . When ES nodes try to connect to monitoring nodes 
following error is thrown on Marvel nodes : (java vesrion is 1.8.0.25 on 
all nodes). All nodes same OS Ubuntu 3.13.0-40-generic x64)


[2015-01-08 18:51:05,993][WARN ][transport.netty  ] [
*es-orn-mon-02] exception caught on transport layer [[id: 0xf67ca3df, 
/10.236.54.121:52237 => /10.236.54.81:9300]], closing 
connectionjava.io.StreamCorruptedException: invalid internal transport 
message format, got (47,45,54,20)*
at 
org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:47)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


On ES cluster nodes following error is thrown : (cluster nodes are able to 
connect on port 9300 of monitoring nodes)

[2015-01-08 18:57:10,545][ERROR][marvel.agent.exporter] [es-orn-c-01] 
failed to verify/upload the marvel template to [es-orn-mon-01:9300]:
Unexpected end of file from server
[2015-01-08 18:57:10,550][ERROR][marvel.agent.exporter] [es-orn-c-01] 
failed to verify/upload the marvel template to [es-orn-mon-02:9300]:
Unexpected end of file from server
[2015-01-08 18:57:10,550][ERROR][marvel.agent.exporter] [es-orn-c-01] 
could not connect to any configured elasticsearch instances: 
[es-orn-mon-01:9300,es-orn-mon-02:9300]
 
Elasticsearch alone works fine on both monitoring and data cluster if 
marvel plugin is removed.

Thanks
Ajay

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3cf77cc5-c206-466e-85d0-d1247a18d733%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Upgrade from ES 1.2.x to ES 1.4 or 1.3?

2015-01-08 Thread Bhumir Jhaveri
Did anyone noticed this?


On Wednesday, January 7, 2015 4:29:58 PM UTC-8, Bhumir Jhaveri wrote:
>
> Here is what I did - 
> I had some data - I bumped up the ES version and then restarted the ES -
>
> It started giving following warnings - 
>
> [2015-01-07 16:26:43,881][WARN ][cluster.action.shard ] [my_node] 
> [data][3] received shard failed for [data][3], 
> node[WIUD-O9XRiyg3Rk5RxHhZw], [P], s[INITIALIZING], indexUUID 
> [781tTXk2Qe6GwwHgXKx2pw], reason [Failed to start shard, message 
> [IndexShardGatewayRecoveryException[[data][3] failed to recover shard]; 
> nested: IllegalArgumentException[No type mapped for [8]]; ]]
> [2015-01-07 16:26:43,887][WARN ][cluster.action.shard ] [my_node] 
> [data][1] sending failed shard for [data][1], node[WIUD-O9XRiyg3Rk5RxHhZw], 
> [P], s[INITIALIZING], indexUUID [781tTXk2Qe6GwwHgXKx2pw], reason [Failed to 
> start shard, message [IndexShardGatewayRecoveryException[[data][1] failed 
> to recover shard]; nested: IllegalArgumentException[No type mapped for 
> [8]]; ]]
> [2015-01-07 16:26:43,967][WARN ][indices.cluster  ] [my_node] 
> [data][4] failed to start shard
> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
> [data][4] failed to recover shard
> at 
> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:241)
> at 
> org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> Its still doing some processing which I could figured out from kopf 
> monitoring - but not sure how long it would take and whether I have screwed 
> up the upgrade here?
>
> On Wednesday, January 7, 2015 11:46:57 AM UTC-8, Mark Walkom wrote:
>>
>> Yes it auto distributes existing, and new, shards.
>>
>> On 8 January 2015 at 05:55, Bhumir Jhaveri  wrote:
>>
>>> Also one more question - lets say intially I have one node architecture 
>>> - i.e. everything on one single node and additional mount is having all the 
>>> ES data (index, documents etc) which is like of 300 gigs - now lets say if 
>>> I add more data nodes - so going forward it will distribute the shards to 
>>> different nodes or existing data will automatically be redistributed too?
>>>
>>>
>>>
>>> On Wednesday, January 7, 2015 9:57:44 AM UTC-8, David Pilato wrote:

 No you don’t have to reindex.
 Elasticsearch can read segments generated with previous elasticsearch 
 version.

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 *
 @dadoonet  | @elasticsearchfr 
  | @scrutmydocs 
 


  
 Le 7 janv. 2015 à 18:50, Bhumir Jhaveri  a écrit :

 Alright cool David! if this is the pain point then I dont think that I 
 should limit this upgrade to only 1.3 - I will go for 1.4

 one more thing - I already have around 300 GB(max capacity 2TB) of 
 indexed data available in my additional storage which I exclusively kept 
 for ES - so far this 300 gigs of data has been generated by ES1.2 - now if 
 I move to ES 1.4 - will that cause any issue?
 Do I need to regenerate anything over here or ES 1.4 will automatically 
 accept whatever has been generated by ES1.2?

 its more of back compatibility point which I am trying to raise here.



 On Wednesday, January 7, 2015 9:46:50 AM UTC-8, David Pilato wrote:
>
> It’s most likely because on your dev system, you are running out of 
> disk space.
> Elasticsearch 1.4.x does not allocate replicas if you have more than 
> 85% disk usage.
> You can change this settings by modifying for example 
> elasticsearch.yml and set:
>
> cluster.routing.allocation.disk.watermark.low: 1gb
> cluster.routing.allocation.disk.watermark.high: 500mb
>
> Regarding your question, I would go for 1.4.
>
> HTH
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>
> Le 7 janv. 2015 à 18:41, Bhumir Jhaveri  a écrit :
>
> I am planning to upgrade from ES 1.2.2 to ES 1.4.2 or 1.3.9 or 
> whichever is the most latest (stable) in 1.3 series - so should I go for 
> 1.4 or should stay with 1.3?
> the reason why I am not thinking about absolute upgrade to 1.4 is just 
> so that if certain things like shards allocations to different node/s 
> suddenly doesnt work or may be I am wrong too - I mean this type of 
> sudden 
> and obvious things just

Re: performance getting even worse after optimization

2015-01-08 Thread Xiaoting Ye
The index is of 149G, 19 shards with 1 replica.

The es version is 1.4.1, and the java version is 1.7.0_71.

I have specific routing strategy and the query used in testing only goes to
one shard:
heap.percentram.percent  load
5158  0.33

(when it is under continues query, just one query at a time)

This specific shard has 22502484 docs, 10GB in size.

Thanks!

On Thu, Jan 8, 2015 at 2:10 AM, Mark Walkom  wrote:

> How big is the index, how many shards and replicas?
> What ES version? What java version?
>
> On 8 January 2015 at 20:40, Xiaoting Ye  wrote:
>
>> Hi,
>>
>>  I just did an _optimize operation on a cluster (10 data nodes, roughly
>> 350,000,000 docs in total). This a cluster only has one index.
>>
>> However, the performance gets even worse: the response time doubled or
>> even tripled.
>>
>> Any hint on this?
>>
>> Thanks!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/d9f9ba25-4a7f-4fba-978c-8368d74bc349%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/W49B4d9MWNk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8B_qPdBA0S8JeXmhM2e013-YxQb5roZAJvEh-r1rxfQQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGVN9k7xZTBpchkOSqp%3DyLxqSKozo1WhVUveq%2BRaLDHmd81kpA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Concurrency problem when automatically creating an index

2015-01-08 Thread Tom
Hi, we'd been using ES for a while now. Specifically version 0.90.3. A 
couple of months ago we decided to migrate to the latest version which was 
finally frozen to be 1.4.1. No data migration was necessary because we have 
a redundant MongoDB, but yesterday we enabled data writing to the new ES 
cluster. All was running smoothly when we noticed that at o'clock times 
there were bursts of four or five log messages of the following kinds:

Error indexing None into index ind-analytics-2015.01.08. Total elapsed 
time: 1065 ms. 
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: 
failed to process cluster event (acquire index lock) within 1s
at 
org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.run(MetaDataCreateIndexService.java:148)
 
~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_17]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[na:1.7.0_17]
at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_17]

[ForkJoinPool-2-worker-15] c.d.i.p.ActorScatterGatherStrategy - Scattering 
to failed in 1043ms org.elasticsearch.action.UnavailableShardsException: 
[ind-2015.01.08.00][0] Not enough active copies to meet write consistency 
of [QUORUM] (have 1, needed 2). Timeout: [1s], request: index 
{[ind-2015.01.08.00][search][...]}
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.retryBecauseUnavailable(TransportShardReplicationOperationAction.java:784)
 
~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseFailureIfHaveNotEnoughActiveShardCopies(TransportShardReplicationOperationAction.java:776)
 
~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:507)
 
~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419)
 
~[org.elasticsearch.elasticsearch-1.4.1.jar:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_17]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[na:1.7.0_17]
at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_17]

This occurs at o'clock times because we write over hour-based indices. For 
example, all writes from 18:00:00 to 18:59:59 of 01/08 goes to 
ind-2015.01.08.18. At 19:00:00 all writes will go to ind-2015.01.08.19, and 
so on.

With 0.90.3 version of ES, automatic index creation was working flawlessly 
(with no complaints) but the new version doesn't seem to handle that 
feature very well. It looks like, when all those concurrent writes competes 
to be the first to create the index, all but one fails. Of course we could 
just create such indices manually to avoid this situation altogether, but 
this would only be a workaround for a feature that previously worked.

Also, we use ES through the native Java client and the configuration for 
all our indices is 

settings = {
  number_of_shards = 5,
  number_of_replicas = 2
}

Any ideas?

Thanks in advance,
Tom;

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4deefb09-bed1-499a-b9fc-3ed4d78fc4c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Seeing Frequent NodeNotConnectedException errors

2015-01-08 Thread Ranga
The issue I am seeing seems similar to what was reported 
at 
https://groups.google.com/forum/#!searchin/elasticsearch/sporadic$20/elasticsearch/jUsoUV3_mbo/nM1OtJ9tmW0J

I enabled more logging on the transport layer and I see the following 
exceptions when the disconnect/reconnect happens.

[2015-01-08 17:32:18,216][TRACE][transport.netty  ] [es1] close 
connection exception caught on transport layer [[id: 0xc4e4b9a1, 
/10.152.16.37:59038 => /10.109.172.201:9300]], disconnecting from relevant 
node
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-01-08 17:32:18,217][DEBUG][transport.netty  ] [es1] 
disconnecting from 
[[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[/10.109.172.201:9300]]], 
channel closed event
[2015-01-08 17:32:18,217][TRACE][transport.netty  ] [es1] 
disconnected from 
[[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[/10.109.172.201:9300]]], 
channel closed event

...
[2015-01-08 17:32:23,400][DEBUG][transport.netty  ] [es1] connected 
to node 
[[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[/10.109.172.201:9300]]]
[2015-01-08 17:32:24,546][INFO ][cluster.service  ] [es1] removed 
{[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[/10.109.172.201:9300]],},
 
reason: 
zen-disco-node_failed([es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[/10.109.172.201:9300]]),
 
reason transport disconnected
[2015-01-08 17:32:26,979][DEBUG][transport.netty  ] [es1] 
disconnecting from 
[[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[/10.109.172.201:9300]]] 
due to explicit disconnect call
[2015-01-08 17:32:26,980][TRACE][transport.netty  ] [es1] 
disconnected from 
[[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[/10.109.172.201:9300]]] 
due to explicit disconnect call
[2015-01-08 17:32:27,075][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0xdbb6e4b0, /10.109.172.201:45127 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0x211fc69e, /10.109.172.201:45128 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0x8bba710f, /10.109.172.201:45130 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0x79f540c2, /10.109.172.201:45129 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0xc65857d4, /10.109.172.201:45133 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0xbca56b4c, /10.109.172.201:45131 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0xa3b3f93f, /10.109.172.201:45132 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0x1400e583, /10.109.172.201:45134 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0x72d6fccf, /10.109.172.201:45138 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0xfb686698, /10.109.172.201:45126 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0xb1633576, /10.109.172.201:45136 => /10.152.16.37:9300]
[2015-01-08 17:32:27,076][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0x782b9080, /10.109.172.201:45137 => /10.152.16.37:9300]
[2015-01-08 17:32:27,077][TRACE][transport.netty  ] [es1] channel 
closed: [id: 0x7ded7f98, /10.109.172.201:45135 => /10.152.16.37:9300]

Re: Upgraded node unable to join cluster while attempting cluster upgrade from 1.3.2 to 1.4.2

2015-01-08 Thread Ben Berg
Thanks for the reply!

That is a good idea to try, but another problem with this is that when i 
uninstall the plugins and keep at version 1.3.2 the node will not join 
cluster either.
We upgraded to java 8 right before starting this upgrade and thinking it 
may have something to do with that - also noticed that somehow JAVA_HOME 
was not defined - working on setting that up across the cluster and will 
attempt to remove plugin while on 1.3.2 and get node to join cluster - if 
that does not work I will downgrade to java 7 and try again.
Thanks again for reply, insights and huge thanks for link to the plugin 
repo - looks awesome and will probably try it after sorting this out. Will 
update with findings later today hopefully.
Ben

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5dfdbb1-507e-4032-9ec2-40d91266432b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Upgraded node unable to join cluster while attempting cluster upgrade from 1.3.2 to 1.4.2

2015-01-08 Thread Radu Gheorghe
Hello Ben,

Maybe it works if you uninstall the plugin from one node at a time and do a
rolling restart (sticking to 1.3.2), then do the upgrade with another
rolling restart, then install the plugin back again with yet another
rolling restart?

I would understand if you said "no way I do 3 restarts!" :) But maybe this
will help in future: https://github.com/jprante/elasticsearch-plugin-deploy

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jan 7, 2015 at 5:15 AM, Ben Berg  wrote:

> Hello,
> I am attempting to upgrade a 10 node cluster from 1.3.2 to 1.4.2 - I
> upgraded the first node,removed and reinstalled latest versions of plugins
> - two non-site plugins are river-twitter (version 2.4.1) and jdbc (version
> 1.4.0.8 and also tried 1.4.0.7) - and when starting the node I see the
> errors below in logs on upgraded node and node does not join cluster. If i
> downgrade to 1.3.2, uninstall plugins and reinstall jdbc river plugin
> version 1.3.0.4 the node properly joins the cluster.
> Errors from logs:
> [2015-01-06 21:29:39,970][INFO ][node ] [bi-es1]
> version[1.4.2], pid[5910], build[927caff/2014-12-16T14:11:12Z]
> [2015-01-06 21:29:39,971][INFO ][node ] [bi-es1]
> initializing ...
> [2015-01-06 21:29:40,009][INFO ][plugins  ] [bi-es1]
> loaded [river-twitter, jdbc-1.4.0.7-a875ced], sites [head, kopf, bigdesk,
> paramedic, HQ, whatson]
> [2015-01-06 21:29:45,587][INFO ][node ] [bi-es1]
> initialized
> [2015-01-06 21:29:45,588][INFO ][node ] [bi-es1]
> starting ...
> [2015-01-06 21:29:45,960][INFO ][transport] [bi-es1]
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 192.168.83.231:9300]}
> [2015-01-06 21:29:45,982][INFO ][discovery] [bi-es1]
> bi-cluster1/VyTzSKWnQAS9ZCk971fJEg
> [2015-01-06 21:29:51,906][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28805963] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:29:51,915][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:29:57,028][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28807127] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:29:57,036][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:30:02,245][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28808254] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:30:02,252][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:30:07,576][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28809377] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:30:07,583][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:30:12,689][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28810050] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:30:12,696][INFO ][discovery.zen] [bi-es1]
> failed to send join request 

Query Help: Grouping and then counting resulting buckets

2015-01-08 Thread Nathan Stott

I'm having an issue getting a query to provide the count that I want. It 
looks like something that may only be supported by "reducers" which are a 
coming feature; however, any insight would be appreciated even if it is 
just confirmation that this is not yet possible.

What I want to do is take a set of person documents, filter them, group the 
remaining persons by address, sort them and do a top hits to narrow it down 
to one person per household, and then count the number of "person types" 
that are left. Everything except the last part, which would normally be a 
terms aggregation on the person type attribute, is straight-forward. Can 
not do a subaggregation of top_hits, so it does not seem possible currently 
to accomplish this.

The following is a gist describing the query that we're trying to do:

https://gist.github.com/tcbeutler/1327e0a623bcdc52e89e

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55ef9fed-6653-4fd3-a0f0-ba5d25d694cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Seeing Frequent NodeNotConnectedException errors

2015-01-08 Thread Ranga
Hi,

I have a cluster of 3-node cluster in EC2 - and am seeing frequent 
NodeNotConnectedException related errors which cause intermittent failures 
during indexing. I'm hoping some one knows what this is able and can help.

Thanks in advance for your help - Here are the details - 

There are 3 nodes (es1, es2 and es3 - all are defined to be 
node.master=true, node.data=true - and es1 is the current master). All 
three nodes are running ES 1.4.2, 15GB heap, r3.xlarge instances, JDK 
1.7.0_72. We are using the AWS-Cloud plugin for ec2 discovery. The 
discovery part works fine I think and we haven't had problems there.

What we are seeing is that the cluster is running fine for most of the 
time, but periodically (say once every hour or two) we seem to see failures 
in the logs on es1 (the master node) with both indexing and with the node 
[indices:monitor/stats] apis (these are debug messages) - and they seem to 
be happening because the connection between the master node (es1) and 
either of the other nodes is lost.

I tried doing searches in this mailing list and then configured tcp keep 
alive settings- I think it helped but not really sure since the "node not 
connected" errors are still happening. 

Here is a section of the master log  that shows the exceptions:



[2015-01-08 14:02:52,203][DEBUG][action.admin.indices.stats] [es1] 
[alert][0], node[jAhWlTiKTASdHDQaZGVncw], [P], s[STARTED]: failed to 
execute 
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@2a694684]
org.elasticsearch.transport.NodeDisconnectedException: 
[es2][inet[/10.109.172.201:9300]][indices:monitor/stats[s]] 
disconnected



[2015-01-08
 14:02:52,205][WARN ][action.index ] [es1] Failed to perform
 indices:data/write/index on remote replica 
[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[
/10.109.172.201:9300]][config][3]
org.elasticsearch.transport.NodeDisconnectedException:
 [es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]] 
disconnected
[2015-01-08 14:02:52,206][WARN ][cluster.action.shard
 ] [es1] [config][3] sending failed shard for [config][3], 
node[jAhWlTiKTASdHDQaZGVncw], [R], s[STARTED], indexUUID 
[xnxor01lSTC8dY-0wwPXlQ], reason [Failed to perform 
[indices:data/write/index] on replica, message 
[NodeDisconnectedException[[es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]]
 disconnected]]]
[2015-01-08 14:02:52,206][WARN 
][cluster.action.shard ] [es1] [config][3] received shard failed for
 [config][3], node[jAhWlTiKTASdHDQaZGVncw], [R], s[STARTED], indexUUID 
[xnxor01lSTC8dY-0wwPXlQ], reason [Failed to perform 
[indices:data/write/index] on replica, message 
[NodeDisconnectedException[[es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]]
 disconnected]]]

[2015-01-08 14:02:52,206][WARN 
][action.index ] [es1] Failed to perform 
indices:data/write/index on remote replica 
[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[
/10.109.172.201:9300]][origin_v0101][0]
org.elasticsearch.transport.NodeDisconnectedException:
 [es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]] 
disconnected
[2015-01-08 14:02:52,206][WARN ][cluster.action.shard
 ] [es1] [origin_v0101][0] sending failed shard for [origin_v0101][0], 
node[jAhWlTiKTASdHDQaZGVncw], [R], s[STARTED], indexUUID 
[_G8gVWViS6OoX59MHJtwhA], reason [Failed to perform 
[indices:data/write/index] on replica, message 
[NodeDisconnectedException[[es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]]
 disconnected]]]
[2015-01-08 14:02:52,206][WARN 
][cluster.action.shard ] [es1] [origin_v0101][0] received shard 
failed for [origin_v0101][0], node[jAhWlTiKTASdHDQaZGVncw], [R], 
s[STARTED], indexUUID [_G8gVWViS6OoX59MHJtwhA], reason [Failed to 
perform [indices:data/write/index] on replica, message 
[NodeDisconnectedException[[es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]]
 disconnected]]]
[2015-01-08 14:02:52,206][WARN ][action.index
 ] [es1] Failed to perform indices:data/write/index on remote 
replica 
[es2][jAhWlTiKTASdHDQaZGVncw][ip-10-109-172-201][inet[
/10.109.172.201:9300]][origin_v0101][0]
org.elasticsearch.transport.NodeDisconnectedException:
 [es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]] 
disconnected
[2015-01-08 14:02:52,206][WARN ][cluster.action.shard
 ] [es1] [origin_v0101][0] sending failed shard for [origin_v0101][0], 
node[jAhWlTiKTASdHDQaZGVncw], [R], s[STARTED], indexUUID 
[_G8gVWViS6OoX59MHJtwhA], reason [Failed to perform 
[indices:data/write/index] on replica, message 
[NodeDisconnectedException[[es2][inet[/10.109.172.201:9300]][indices:data/write/index[r]]
 disconnected]]]
[2015-01-08 14:02:52,207][WARN 
][cluster.action.shard ] [es1] [origin_v0101][0] received shard 
failed for [origin_v0101][0], node[jAhWlTiKTASdHDQaZGVncw], [R], 
s[STARTED], indexUUID [_G8gVWViS6OoX59MHJtwhA], re

Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Radu Gheorghe
Thanks, David! I had no idea it works until... about one hour ago :)

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 4:01 PM, David Pilato  wrote:

> Very nice Radu. I love this trick. :)
>
> --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
> *
> @dadoonet  | @elasticsearchfr
>  | @scrutmydocs
> 
>
>
>
> Le 8 janv. 2015 à 14:43, Radu Gheorghe  a
> écrit :
>
> Hi Paresh,
>
> If you want to sort on the field, I think it has to be the same type. So
> if you make everything a double, it should work for all numeric fields. To
> do that, you can use dynamic templates
> .
> For example if you have this:
>
>   "mappings" : {
> "_default_" : {
>"dynamic_templates" : [ {
>  "long_to_float" : {
>"match" : "*",
>"match_mapping_type" : "long",
>"mapping" : {
>  "type" : "float"
>}
>  }
>} ]
>  }
>   }
>
> And add a new field with value=32, the field would be mapped as float
> instead of long.
>
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
> On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede 
> wrote:
>
>> Hi,
>>
>> I have requirement of storing document in elastic search which will have
>> dynamic fields + those fields could have different data types values...
>>
>> For e.g.,
>> Document 1 could have age field with value = 32, so when I would insert
>> 1st document in ES my index mapping will get created and age will be mapped
>> to Integer/Long
>>
>> Now if I get age = 32.5 in another document ES will throw me exception of
>> data type mismatch...
>>
>> Can you suggest what can I do to handle such scenario?
>>
>> As workaround we are creating different fields for different data types
>> like age.long / age.double but this also won't work if I have to do sorting
>> over age field...
>>
>> Kindly suggest...
>>
>> Thanks in advance,
>> Paresh Behede
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_%2BPsCYxCHu7_ErZVw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/E0768DFA-EF17-46F2-B488-5EC29A60E37D%40pilato.fr
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1cOrR%3D_bndAjg-5CmL8q1AdJESCpj5NPtqxYeXUTscDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to log input user data (json)

2015-01-08 Thread Przemyslaw
Dears,

any comments...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ea34f97a-dee4-4887-83ec-bb281a730f47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


field not analyzed

2015-01-08 Thread Deve java
hi, 
how can i configure with java api a filed as : not analyzed.

thanks 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e779f7e8-5db5-4cb8-9c39-682d54bb9540%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2015-01-08 Thread Roger de Cordova Farias
Thank you very much

2015-01-08 4:35 GMT-02:00 Masaru Hasegawa :

> Hi,
>
> I believe it's intended according to
> https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> .
> It says:
> --
> Note that CollectionStatistics.maxDoc() is used instead of
> IndexReader#numDocs() because also TermStatistics.docFreq() is used, and
> when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in
> the same direction. In addition, CollectionStatistics.maxDoc() is more
> efficient to compute
> --
>
> Masaru
>
> On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias <
> roger.far...@fontec.inf.br> wrote:
>
>> Thank you for your explanation
>>
>> Do you know if it is a bug of intended behavior?
>>
>> I don't think deleted (marked as deleted) docs should be used at all
>>
>> 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa :
>>
>>> Hi,
>>>
>>> Update is delete and add. I mean, instead of updating existing document,
>>> it deletes it and adds it as new document.
>>> And those deleted documents are just marked as deleted and aren’t
>>> actually removed from index until the segment merge.
>>>
>>> IDF doesn’t take those deleted-but-not-removed document into account (it
>>> counts those documents).
>>> That’s the reason you see different IDF score (you see both maxDocs and
>>> docFreq are incremented).
>>>
>>> Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID).
>>> But when the document is updated (delete + add), it got new ID 0 in new
>>> segment.
>>>
>>> So, I think it’s not possible to keep score when you update documents.
>>> You can run optimise with max_num_segments=1 every time you update
>>> documents but it’s not practical (and until optimise is done, you see
>>> different score)
>>>
>>>
>>> Masaru
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533-8TBoyPmfpqj12T_TVb4z%2BrgLKqtuOxRfReajti7WfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread David Pilato
Very nice Radu. I love this trick. :)

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 8 janv. 2015 à 14:43, Radu Gheorghe  a écrit :
> 
> Hi Paresh,
> 
> If you want to sort on the field, I think it has to be the same type. So if 
> you make everything a double, it should work for all numeric fields. To do 
> that, you can use dynamic templates 
> .
>  For example if you have this:
> 
>   "mappings" : {
> "_default_" : {
>"dynamic_templates" : [ {
>  "long_to_float" : {
>"match" : "*",
>"match_mapping_type" : "long",
>"mapping" : {
>  "type" : "float"
>}
>  }
>} ]
>  }
>   }
> 
> And add a new field with value=32, the field would be mapped as float instead 
> of long.
> 
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/ 
> On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede  > wrote:
> Hi,
> 
> I have requirement of storing document in elastic search which will have 
> dynamic fields + those fields could have different data types values...
> 
> For e.g., 
> Document 1 could have age field with value = 32, so when I would insert 1st 
> document in ES my index mapping will get created and age will be mapped to 
> Integer/Long
> 
> Now if I get age = 32.5 in another document ES will throw me exception of 
> data type mismatch...
> 
> Can you suggest what can I do to handle such scenario?
> 
> As workaround we are creating different fields for different data types like 
> age.long / age.double but this also won't work if I have to do sorting over 
> age field...
> 
> Kindly suggest...
> 
> Thanks in advance,
> Paresh Behede
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_%2BPsCYxCHu7_ErZVw%40mail.gmail.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E0768DFA-EF17-46F2-B488-5EC29A60E37D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread vipins
Thanks for your prompt response. 

Surely will reduce the number of shards with nodes/replicas addition for the
better performance of the search.






--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702p4068713.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1420725707449-4068713.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread Radu Gheorghe
OK, now it makes sense. 5 requests with 320 shards might saturate your
queue.

But 320 shards sounds like a lot for one index. I assume you don't need to
scale that very index to 320 nodes (+ replicas). If you can get the number
of shards down (say, to the default of 5) things will surely look better
not only from the queue's perspective, but it should also improve search
performance.

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 3:46 PM, vipins  wrote:

> Sorry , I was wrong with number of shards. actual number of shards is 320
> for
> the index which i am querying.
>
> We are using rolling indices on a daily basis.
>
> max queue size is 1000 for search thread pool.
>
> We overcome the issue None of the configured nodes are available by keeping
> tcp connection alive as true.
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702p4068711.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1420724779413-4068711.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_00wG%2BNUQQm2_KtH7jKBC7ovN1AXnAf9Jot2VCTppMk9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread vipins
Sorry , I was wrong with number of shards. actual number of shards is 320 for
the index which i am querying.

We are using rolling indices on a daily basis.

max queue size is 1000 for search thread pool.

We overcome the issue None of the configured nodes are available by keeping
tcp connection alive as true. 




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702p4068711.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1420724779413-4068711.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Radu Gheorghe
Hi Paresh,

If you want to sort on the field, I think it has to be the same type. So if
you make everything a double, it should work for all numeric fields. To do
that, you can use dynamic templates
.
For example if you have this:

  "mappings" : {
"_default_" : {
   "dynamic_templates" : [ {
 "long_to_float" : {
   "match" : "*",
   "match_mapping_type" : "long",
   "mapping" : {
 "type" : "float"
   }
 }
   } ]
 }
  }

And add a new field with value=32, the field would be mapped as float
instead of long.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede  wrote:

> Hi,
>
> I have requirement of storing document in elastic search which will have
> dynamic fields + those fields could have different data types values...
>
> For e.g.,
> Document 1 could have age field with value = 32, so when I would insert
> 1st document in ES my index mapping will get created and age will be mapped
> to Integer/Long
>
> Now if I get age = 32.5 in another document ES will throw me exception of
> data type mismatch...
>
> Can you suggest what can I do to handle such scenario?
>
> As workaround we are creating different fields for different data types
> like age.long / age.double but this also won't work if I have to do sorting
> over age field...
>
> Kindly suggest...
>
> Thanks in advance,
> Paresh Behede
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_%2BPsCYxCHu7_ErZVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread Radu Gheorghe
You're welcome.

So you're saying you're running 5 searches on a single index with 5 shards
(25 per-shard queries in total) and you're getting an error? I assume that
error doesn't say the queue is full because the queue is 1000. Can you post
the full error and also a gist where you reproduce the issue
? I might be missing an essential bit
here.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 3:14 PM, vipins  wrote:

> Thanks a lot for your detailed response.
> We have got all default settings only.Single node and 5 shards. But there
> are lot of indices with huge number of records.
> search settings:
>   "threads" : 12,
>   "queuesize" : 1000,
>
> My query is very simple. which runs on a single index only.
>
> Even with 5 requests in between it is throwing None of the configured nodes
> are available.
>
> Thanks,
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702p4068707.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1420722888084-4068707.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0DMNN91Xo_iQpFFdep%3DosynTs5tGr-UDX_EOCOX%3DTg5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I input data from java to ES in real time?

2015-01-08 Thread Marian Valero
Ok! thank you, and referent to cluster, because is so many data an everyday 
this increment and I can't have all data in only one machine, how many 
cluster I have to use?

Thanks for all.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0886b197-b9e9-4cea-bce5-64f0a47b203d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding node architecture initial setup

2015-01-08 Thread Radu Gheorghe
Hello Phani,

Usually the dedicated masters are much smaller than the data nodes, because
they have much less work to do. If the 4 nodes you're talking about are
equal, it might be inefficient to add a 5th so you can have 2 data and 3
master nodes. Maybe for the same budget of adding the 5th you can add 3
small master nodes and keep the 4 as data nodes. Then you'd have
minimum_master_nodes=2. This is the ideal case IMO.

If you don't have lots of data, you can have a setup with 3 nodes: all
master-eligible, only two of them hold data and the 3rd would be a
dedicated master that would act like a tie breaker. The downside is when
both data nodes are super-busy that you won't have a cluster (the tie
breaker won't be able to get a quorum). But then again, if both your data
nodes are unresponsive, having a working cluster has little value.

You can extend this setup with your 4 nodes: all master-eligible, 3 data
and one dedicated master. You'll have to increase minimum_master_nodes to 3
(otherwise you can have a split-brain). Your cluster will still tolerate
one node going down as in the previous case, but you'll have more capacity.
This might be the best bet (capacity vs safety) if you absolutely have to
stick with the 4 servers.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 11:53 AM,  wrote:

> Hi All
>
>  I have chosen to establish 4 nodes in my cluster. I read concept of
> dedicated master nodes and only data holding nodes in elastic search.please
> explain me briefly how can i establish cluster by using the above four
> nodes.
>
>suppose if i have chosen N/2+1 for 4 nodes the minimum no of master
> nodes would be 3 so one node left in my cluster. master nodes only managing
> and indexing data to other nodes i.e data nodes. to implement replica do we
> need 5 nodes because i left with only 1 data node where i can keep replica?
>
>  other wise will the primary shard resides on any one of master node
> and replica will be hold my data node or please explain me how to design
> above scenarios with my four nodes in cluster.
>
> Thanks
>
> phani
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ba85ef3c-afc8-45b9-b1b7-e8dbdd32313c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0WMSjCycEsje0BYMm-oZXNTg_MQLe24RcqyFThXC7RbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread vipins
Thanks a lot for your detailed response.
We have got all default settings only.Single node and 5 shards. But there
are lot of indices with huge number of records.
search settings:
  "threads" : 12,
  "queuesize" : 1000,
 
My query is very simple. which runs on a single index only. 

Even with 5 requests in between it is throwing None of the configured nodes
are available. 

Thanks,



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702p4068707.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1420722888084-4068707.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any solution to do the “NOT IN” functionality in Elasticsearch?

2015-01-08 Thread ES USER
Not sure this really helps you but it might be easier and more reliable of 
a search to do this as two separate queries the first would just be an agg 
listing all distinct users and the second an agg listing users who have an 
action of "signup"? and then just subtracting that list from the first.






On Wednesday, January 7, 2015 6:18:34 AM UTC-5, Ho-sang Jeon wrote:
>
> Thanks Andrien Grand. 
>>
>
> To clarify my quesion, I have added some example data below.
>
> Here is an example data.
>
> curl -s -XPOST 'localhost:9200/my_index/my_type/1' -d'{ "user_id": 1234, 
> "action": "signup" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/2' -d'{ "user_id": 1234, 
> "action": "visit" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/3' -d'{ "user_id": 1234, 
> "action": "visit" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/4' -d'{ "user_id": 5678, 
> "action": "visit" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/5' -d'{ "user_id": 5678, 
> "action": "visit" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/6' -d'{ "user_id": 9012, 
> "action": "signup" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/7' -d'{ "user_id": 9012, 
> "action": "visit" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/8' -d'{ "user_id": 9012, 
> "action": "visit" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/9' -d'{ "user_id": 3456, 
> "action": "visit" }'
> curl -s -XPOST 'localhost:9200/my_index/my_type/10' -d'{ "user_id": 3456, 
> "action": "visit" }'
>
> What I really want to get is the "Documents whose user_id DOES NOT signed 
> up based on these log data". So, documents [*4, 5, 9, 10*] are the final 
> results what I want to get.
>
> Is it possible to get the results what I want in Elasticsearch?
> Thanks in advance.  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1cf8f178-08de-4e8b-943e-73f3e5ce8042%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread Radu Gheorghe
Hello,

The search threadpool size (that is, how many requests can be actually
worked on at once) defaults to 3 times the number of processors. This might
be reduced in future, though, see:
https://github.com/elasticsearch/elasticsearch/pull/9165

The queue size (how many requests ES can accept before starting to reject
them) defaults to 1000.

>From what I understand, this is per thread, so the answer to your question
depends on how many processors you have and how many shards get hit by each
search. For example, if a search runs on 3 indices, each with 2 shards
(number of replicas is irrelevant, because the search will only hit one
complete set of data) you'll get 6 requests in the threadpool per search.
If you have two servers with 8 cores each, you 8*3*2=48 threads available.
So the cluster can work on 8 requests at once. On top of that it can still
queue round-down(2 nodes * 1000 queue size/6 requests per search)=333
searches until it starts rejecting them.

Regarding your scaling question, I can't give you a direct answer,
unfortunately, because it depends on a whole lot of variables, mainly how
your data, queries and hardware look like and what can be changed. The fact
that your threadpool queue got full is just a symptom, it's not clear to me
what happens in there. I usually see this when there are lots of indices
and/or those indices have lots of shards. So a single request takes a lot
of requests in the threadpool, filling it up, even if the ES cluster can
keep up with the load. If that's your case increase the threadpool queue
size and make sure you don't have too many shards per index.

If your cluster can't keep up with the load (a monitoring tool like SPM
 should show you that), then the first step is to
see where is the bottleneck. Again, monitoring can give some insight: are
queries too expensive, can they be optimized? do you have too many cache
evictions? is the heap size too large or too small? is memory, I/O or CPU
the bottleneck? Things like that. It could also be that you need
more/different hardware.

Finally, you can make scaling ES someone else's problem by using a hosted
service like Logsene . Especially
if you're using ES for log- or metric-like data, you'll get lots of
features out of the box, and we expose most of the ES API to plug in your
custom stuff.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 1:32 PM, vipins  wrote:

> What is the maximum limit on the concurrent search requests with default
> Elastic search server settings.
>
> I am able to perform only 5 parallel search requests in my application with
> default settings.
>
> how can we improve the scalability of ES server search requests apart from
> increasing number of node,shards and queue size in search thread pool.
>
> thanks.
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1420716748150-4068702.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2ZoQ6bewJ_aSxFMxAO0_5Mdsu%3D9WA4m_be7AfSb%3D2%2BTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Kibana geo polygon support

2015-01-08 Thread Hilla Benita
Hello,

Is Kibana support geo polygon filter (as the following example)?
Can I make filter/query for this?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-polygon-filter.html


Thanks,
Hilla

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7bf7efe7-1ae0-496e-a6ce-158b1268cb81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


concurrent search request to elasticsearch

2015-01-08 Thread vipins
What is the maximum limit on the concurrent search requests with default
Elastic search server settings.

I am able to perform only 5 parallel search requests in my application with
default settings.

how can we improve the scalability of ES server search requests apart from
increasing number of node,shards and queue size in search thread pool.

thanks.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1420716748150-4068702.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bucket query results | top hits performance

2015-01-08 Thread Martijn v Groningen
Micheal & Dustin, what should reduce the query time a lot is if you set
`collect_mode` to `breadth_first` on the `top-fingerprints` agg. Like this:
GET /_search?search_type=count
{
  "aggs": {
"top-fingerprints": {
  "terms": {
"field": "fingerprint",
"size": 50,
"collect_mode": "breadth_first"
  },
  "aggs": {
"top_tag_hits": {
  "top_hits": {
"size": 1,
"_source": {
  "include": [
"title"
  ]
},
"sort": {
  "_doc": {}
}
  }
}
  }
}
  }
}

By default the the top_hits agg will create and maintain a priority hit
queue for all buckets that are created by the terms agg, so also the ones
outside of the top 50, which can potentially be millions. By telling the
terms agg to run in breadth_first mode the top_hits only creates and
maintains a priority hit queue for the top 50 buckets instead of all
buckets. This should make things much better performance wise. There is one
catch to it, the top_hits can't sort by score any more (which is the
default), because the breadth_first collect mode doesn't buffer scores.
That is why the sort is defined on the top_hits agg. In this example I sort
by Lucene docid, which is a kind of arbitrary, because you can't have
control over these sort values, but you can sort by any field in your
mapping.

More information about collect mode:
1)
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_collect_mode
2)
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_preventing_combinatorial_explosions.html#_depth_first_versus_breadth_first

On 8 January 2015 at 10:56, Martijn v Groningen <
martijn.v.gronin...@gmail.com> wrote:

> Micheal: I'd would expect that setting the `size` option on the terms agg
> to a smaller value would have a positive impact on the total query time.
> Feels like I'm missing something, can you run hot threads api (
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html#cluster-nodes-hot-threads)
> while your run the search request that you've shared before? This basically
> gives a cluster wide stack dump and can perhaps give me an insight why your
> search request is slow.
>
> Setting the `size` option of terms agg to 0 will return all buckets of
> that can be found on the fingerprint field (which can be millions of
> buckets), so I can see how this can bring down your cluster, because that
> simply doesn't fit in the Java heap space.
>
> Dustin: The `top_hits` aggregation is always nested under a bucket
> aggregator (for example the `terms` bucket aggregator). For each bucket the
> terms aggregator create the top_hits aggregator will create a priority
> queue, where this top_hits aggregator is going to maintain the top N docs
> that fall under the bucket it is in. So the time spent by the top_hits
> aggregator, like any other nested aggregator depends on the number of
> buckets being maintained during the execution of the search request. With
> the top_hits this is more noticeable compared to for example a metric agg
> (min, max, avg etc.), because of what the top_hits aggregator does.
>
> On 7 January 2015 at 20:29, Dustin Boswell  wrote:
>
>> I'm curious what the underlying algorithm is for TopHits.
>>
>> My mental model for ordinary aggregations is that there's basically a
>> hash table of (field_value -> count) maintained (for each field being
>> aggregated), and that hash table count is incremented once per document,
>> and then the top K elements of that hash table are returned to the user.
>> So there's O(1) work for each document scored, and then a final O(N*logN)
>> sort on that hash table to get the top K, where N is the number of unique
>> field_values.  It makes sense to me why this implementation would be very
>> fast.
>>
>> My mental model for a top_hits aggregation is that there's a hash table
>> of (field_value -> array(pair(doc_id, score))).  And for each document
>> being scored, that (doc_id, score) is appended to the corresponding array.
>> Again, there's only O(1) work for each document.  At the end, you have to
>> sort each array, and then sort the hash table, and take the top K1 arrays,
>> and the top K2 elements of each array, and then for each doc_id, pull out
>> the relevant fields to return to the user.  So definitely more work (and a
>> lot more memory), but I'm not sure if this would result in the 30x increase
>> in runtime we're seeing.  (And actually, for the special case where
>> top_hits->size == 1, you only need the top (doc_id, score) seen, not a
>> whole array, so that would be a lot faster and less memory. But I
>> understand it needs to be able to handle more general cases.)
>>
>> Is this at all close to how it works?
>>
>> On Tuesday, January 6, 2015 11:20:08 PM UTC-8, Martijn v Groningen wrote:
>>>
>>> Hi 

Won't import any data end up with error

2015-01-08 Thread Pavol Havlik
Hi guys,

i m trying to get up running Elasticsearch version 1.4.2 on my mac os x 
10.8.5.
my list of plugins :


   - plugins: 
   [
  - 
  {
 - name: "jdbc-1.4.0.8-b1a51d0",
 - version: "1.4.0.8",
 - description: "JDBC plugin",
 - jvm: true,
 - site: false
 },
  - 
  {
 - name: "jdbc-river",
 - version: "NA",
 - description: "JDBC River",
 - jvm: true,
 - site: false
 },
  - 
  {
 - name: "head",
 - version: "NA",
 - description: "No description found.",
 - url: "/_plugin/head/",
 - jvm: false,
 - site: true
 }
  ]
   


*When I m running this *

curl -XDELETE localhost:9200/_river

curl -XPUT 'localhost:9200/_river/search_river_1/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/licklist",
"user" : "*",
"password" : "*",
"sql" : "select sv.*, 16 AS weight from search_venues sv",
"index" : "search",
"type" : "Venue",
"max_bulk_requests" : 5,
"bulk_flush_interval" : "50s"
}
}'


*i receive in log:*

[2015-01-08 10:34:42,356][INFO ][cluster.metadata ] [Paradigm] 
[_river] update_mapping [search_river_1] (dynamic)
[2015-01-08 10:37:27,525][INFO ][cluster.metadata ] [Paradigm] 
[_river] deleting index
[2015-01-08 10:37:28,229][INFO ][cluster.metadata ] [Paradigm] 
[_river] creating index, cause [auto(index api)], shards [1]/[1], mappings 
[search_river_1]
[2015-01-08 10:37:28,254][INFO ][cluster.metadata ] [Paradigm] 
[_river] update_mapping [search_river_1] (dynamic)
[2015-01-08 10:37:28,257][WARN ][river] [Paradigm] 
failed to create river [jdbc][search_river_1]
org.elasticsearch.common.inject.CreationException: Guice creation errors:

1) Error injecting constructor, java.lang.NoSuchMethodError: 
org.xbib.elasticsearch.river.jdbc.RiverSource.driver(Ljava/lang/String;)Lorg/xbib/elasticsearch/river/jdbc/RiverSource;
  at org.xbib.elasticsearch.river.jdbc.JDBCRiver.(Unknown Source)
  while locating org.xbib.elasticsearch.river.jdbc.JDBCRiver
  while locating org.elasticsearch.river.River


I will appreciate your help, thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0dcfbb3-1cae-4504-9e59-0653f5616373%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


PHP SDK - "size" problem

2015-01-08 Thread Svetlozar Penev
Hello,

I want query to return me 15 results.

This is my code:
 $params = array();
 $params['body']["size"] = 15; 
 $params['body']['query']['match']["country"] = 53;
 $results = $client->search($params);

and response:
*Error 500 transfer closed with 33476 bytes remaining to read*


I spent a lot of hours of debugging and I really need help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6b84beab-5307-42f6-b360-a91fced2addc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: whether the heap size of http node can be promoted to more than 32 GB?

2015-01-08 Thread Mark Walkom
Yes, but once you get to 32GB and more the java pointers are no longer
compressed and you lose some efficiencies.
You might be able to cope with this if you have a lot of CPU power.

Also look into the G1GC, it's not currently recommended, but for larger
heap sizes it's probably worth trialing.

On 8 January 2015 at 20:43, yang ming  wrote:

> Hi All,
>
> The http node is independent, and without any data. As the guide said,
> 50% of RAM should be assigned to Lucene.
>
> Http node does not hold any data file, Does it mean that we can give
> more heap size than 32 GB, if the node have 128 GB RAM.
>
> The reason to do this is that there will be a bottleneck in http node
> when doing big data aggregation, terms aggregation e.g.
>
> So, we want to increase the heap size on http node. Is it feasible?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/eda49d8d-d929-4030-8a4a-a2effc47df5e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9BKgyzfc-%2BAgB0jFyBeYrmJsXBUY8i9EPppJX3ENcrbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: performance getting even worse after optimization

2015-01-08 Thread Mark Walkom
How big is the index, how many shards and replicas?
What ES version? What java version?

On 8 January 2015 at 20:40, Xiaoting Ye  wrote:

> Hi,
>
>  I just did an _optimize operation on a cluster (10 data nodes, roughly
> 350,000,000 docs in total). This a cluster only has one index.
>
> However, the performance gets even worse: the response time doubled or
> even tripled.
>
> Any hint on this?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d9f9ba25-4a7f-4fba-978c-8368d74bc349%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8B_qPdBA0S8JeXmhM2e013-YxQb5roZAJvEh-r1rxfQQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bucket query results | top hits performance

2015-01-08 Thread Martijn v Groningen
Micheal: I'd would expect that setting the `size` option on the terms agg
to a smaller value would have a positive impact on the total query time.
Feels like I'm missing something, can you run hot threads api (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html#cluster-nodes-hot-threads)
while your run the search request that you've shared before? This basically
gives a cluster wide stack dump and can perhaps give me an insight why your
search request is slow.

Setting the `size` option of terms agg to 0 will return all buckets of that
can be found on the fingerprint field (which can be millions of buckets),
so I can see how this can bring down your cluster, because that simply
doesn't fit in the Java heap space.

Dustin: The `top_hits` aggregation is always nested under a bucket
aggregator (for example the `terms` bucket aggregator). For each bucket the
terms aggregator create the top_hits aggregator will create a priority
queue, where this top_hits aggregator is going to maintain the top N docs
that fall under the bucket it is in. So the time spent by the top_hits
aggregator, like any other nested aggregator depends on the number of
buckets being maintained during the execution of the search request. With
the top_hits this is more noticeable compared to for example a metric agg
(min, max, avg etc.), because of what the top_hits aggregator does.

On 7 January 2015 at 20:29, Dustin Boswell  wrote:

> I'm curious what the underlying algorithm is for TopHits.
>
> My mental model for ordinary aggregations is that there's basically a hash
> table of (field_value -> count) maintained (for each field being
> aggregated), and that hash table count is incremented once per document,
> and then the top K elements of that hash table are returned to the user.
> So there's O(1) work for each document scored, and then a final O(N*logN)
> sort on that hash table to get the top K, where N is the number of unique
> field_values.  It makes sense to me why this implementation would be very
> fast.
>
> My mental model for a top_hits aggregation is that there's a hash table of
> (field_value -> array(pair(doc_id, score))).  And for each document being
> scored, that (doc_id, score) is appended to the corresponding array. Again,
> there's only O(1) work for each document.  At the end, you have to sort
> each array, and then sort the hash table, and take the top K1 arrays, and
> the top K2 elements of each array, and then for each doc_id, pull out the
> relevant fields to return to the user.  So definitely more work (and a lot
> more memory), but I'm not sure if this would result in the 30x increase in
> runtime we're seeing.  (And actually, for the special case where
> top_hits->size == 1, you only need the top (doc_id, score) seen, not a
> whole array, so that would be a lot faster and less memory. But I
> understand it needs to be able to handle more general cases.)
>
> Is this at all close to how it works?
>
> On Tuesday, January 6, 2015 11:20:08 PM UTC-8, Martijn v Groningen wrote:
>>
>> Hi Michael,
>>
>> In general the more buckets being returned by the parent aggregator the
>> top_hits is nested in, the more work the top_hits agg needs to do, but I
>> didn't come across performance issues with `size` on terms agg being set to
>> 50 and the time it takes to execute increasing 30 times when top_hits is
>> used. To exclude this on your side, can you play around with the `size`
>> option on terms agg?
>>
>> Also perhaps the _source of your documents are relatively large. How does
>> the top_hits agg perform without the `_source` option on the top_hits agg?
>>
>> Martijn
>>
>> On 6 January 2015 at 22:29, Michael Irani  wrote:
>>
>>> Sure. I simplified the query to keep things focused.
>>>
>>> This query takes about 3 seconds to run:
>>>
>>> {
>>>
>>> "size": 0,
>>>
>>> "aggs": {
>>> "top-fingerprints": {
>>> "terms": {
>>> "field": "fingerprint",
>>> "size": 50
>>> },
>>> "aggs": {
>>> "top_tag_hits": {
>>> "top_hits": {
>>> "size": 1,
>>> "_source": {
>>>"include": [
>>>   "title"
>>>]
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }
>>>
>>> }
>>>
>>>
>>> This one takes about 80 milliseconds:
>>>
>>> {
>>>
>>> "size": 0,
>>>
>>> "aggs": {
>>> "fingerprints": {
>>> "terms": {
>>> "field": "fingerprint",
>>> "size": 100
>>> }
>>> }
>>> }
>>>
>>> }
>>>
>>>
>>> The result's a bit too big to paste here. Anything specific about it you 
>>> want me to expose?
>>>
>>>
>>> Michael.
>>>
>>>
>>> On Tuesday, January 6, 2015 12:14:55 PM UTC-8, Itamar Syn-Hershko wrote:

 Can you sha

Regarding node architecture initial setup

2015-01-08 Thread phani . nadiminti
Hi All

 I have chosen to establish 4 nodes in my cluster. I read concept of 
dedicated master nodes and only data holding nodes in elastic search.please 
explain me briefly how can i establish cluster by using the above four 
nodes. 

   suppose if i have chosen N/2+1 for 4 nodes the minimum no of master 
nodes would be 3 so one node left in my cluster. master nodes only managing 
and indexing data to other nodes i.e data nodes. to implement replica do we 
need 5 nodes because i left with only 1 data node where i can keep replica?

 other wise will the primary shard resides on any one of master node 
and replica will be hold my data node or please explain me how to design 
above scenarios with my four nodes in cluster.

Thanks 

phani

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba85ef3c-afc8-45b9-b1b7-e8dbdd32313c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


whether the heap size of http node can be promoted to more than 32 GB?

2015-01-08 Thread yang ming
Hi All,

The http node is independent, and without any data. As the guide said, 
50% of RAM should be assigned to Lucene.

Http node does not hold any data file, Does it mean that we can give 
more heap size than 32 GB, if the node have 128 GB RAM. 

The reason to do this is that there will be a bottleneck in http node 
when doing big data aggregation, terms aggregation e.g.

So, we want to increase the heap size on http node. Is it feasible?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eda49d8d-d929-4030-8a4a-a2effc47df5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: "attachment" handler type not found

2015-01-08 Thread Shashi

I have Elasticsearch version:1.4.2 and mapper-attachment version 2.4.1

On Thursday, January 8, 2015 3:03:39 PM UTC+5:30, Shashi wrote:
>
> hello all,
> I am also getting the same error...please guide..
>
>
> On Friday, August 17, 2012 8:42:22 PM UTC+5:30, msya wrote:
>>
>> Hello, 
>>
>> I downloaded the plugin mapper-attachments and and tried to set a mapping 
>> where the type is "attachment". I did restart after I installed the 
>> mapper-attachment. However, I am getting the error - "No handler for type 
>> [attachment]". How do I resolve this error? 
>>
>>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1842aa07-a144-4412-be4b-23176cc31cb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


performance getting even worse after optimization

2015-01-08 Thread Xiaoting Ye


Hi,

 I just did an _optimize operation on a cluster (10 data nodes, roughly 
350,000,000 docs in total). This a cluster only has one index.

However, the performance gets even worse: the response time doubled or even 
tripled.

Any hint on this?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d9f9ba25-4a7f-4fba-978c-8368d74bc349%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: "attachment" handler type not found

2015-01-08 Thread Shashi
hello all,
I am also getting the same error...please guide..


On Friday, August 17, 2012 8:42:22 PM UTC+5:30, msya wrote:
>
> Hello, 
>
> I downloaded the plugin mapper-attachments and and tried to set a mapping 
> where the type is "attachment". I did restart after I installed the 
> mapper-attachment. However, I am getting the error - "No handler for type 
> [attachment]". How do I resolve this error? 
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33b82ecf-5151-4b43-a955-3975d7e303db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Paresh Behede
Hi,

I have requirement of storing document in elastic search which will have 
dynamic fields + those fields could have different data types values...

For e.g., 
Document 1 could have age field with value = 32, so when I would insert 1st 
document in ES my index mapping will get created and age will be mapped to 
Integer/Long

Now if I get age = 32.5 in another document ES will throw me exception of 
data type mismatch...

Can you suggest what can I do to handle such scenario?

As workaround we are creating different fields for different data types 
like age.long / age.double but this also won't work if I have to do sorting 
over age field...

Kindly suggest...

Thanks in advance,
Paresh Behede


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: NumberFormatException when using regex filter with int field

2015-01-08 Thread Zippy
Yes. copy the int field value to a string field with copy_to or manually.

Am Donnerstag, 8. Januar 2015 09:56:40 UTC+1 schrieb AK:
>
> Hello, 
> I am using ES Java api to run queries against ES. I have a field in ES 
> which has int data. When I run regex filter on that field im getting number 
> format exception. 
> Can't we use regex with numbers? is it only for string fields?
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c58c0d9a-0bb3-428d-9fa2-4ad45e662e14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


NumberFormatException when using regex filter with int field

2015-01-08 Thread AK
Hello, 
I am using ES Java api to run queries against ES. I have a field in ES 
which has int data. When I run regex filter on that field im getting number 
format exception. 
Can't we use regex with numbers? is it only for string fields?
Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f797a64-273b-468e-af66-05063c5e982d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.