Re: Upgraded node unable to join cluster while attempting cluster upgrade from 1.3.2 to 1.4.2

2015-01-09 Thread Radu Gheorghe
Yeah, my general approach so far was to stay away from plugins as much as
possible and use external services instead. Something that your JDBC river
can do as well, as far as I understand.

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Jan 9, 2015 at 10:20 AM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> No I didn't :(
>
> I still struggle with stateful plugins - there is no good solution I know
> of and that is the reason of trouble.
>
> Jörg
>
> On Fri, Jan 9, 2015 at 9:01 AM, Radu Gheorghe 
> wrote:
>
>> Thanks Ben and Joerg. Too bad there was no good solution this particular
>> upgrade, at least we all learned something (well, aside from Joerg who knew
>> it all from the start :D).
>>
>> Best regards,
>> Radu
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Fri, Jan 9, 2015 at 12:28 AM, joergpra...@gmail.com <
>> joergpra...@gmail.com> wrote:
>>
>>> Hi Radu,
>>>
>>> the deploy plugin is somewhat limited to "restartable" plugins as it can
>>> only restart services of a plugin, i.e. lifecycle components with carefully
>>> implemented doStart()/doStop() methods.
>>>
>>> Most plugins come with modules that add REST endpoints, actions,
>>> parsers, functions etc. as modules by the onModule...() mechanism, so they
>>> get "baked" into the ES node as immutable objects before the node is going
>>> up. This means, the deploy plugin can not change or undo this behavior.
>>> (Maybe some evil hackery will do to a certain degree)
>>>
>>> For my use case, I embed a Ratpack server and start my own HTTP web
>>> server that does not use baked-in ES modules, just a restartable service.
>>> It is really delightful to just use a single curl PUT command for rapid
>>> prototyping a Ratpack web app embedded in ES without node restart.
>>>
>>> A restartable plugin is:
>>>
>>> https://github.com/jprante/elasticsearch-plugin-ratpack
>>>
>>> Jörg
>>>
>>>
>>>
>>> On Thu, Jan 8, 2015 at 5:24 PM, Radu Gheorghe <
>>> radu.gheor...@sematext.com> wrote:
>>>
>>>> Hello Ben,
>>>>
>>>> Maybe it works if you uninstall the plugin from one node at a time and
>>>> do a rolling restart (sticking to 1.3.2), then do the upgrade with another
>>>> rolling restart, then install the plugin back again with yet another
>>>> rolling restart?
>>>>
>>>> I would understand if you said "no way I do 3 restarts!" :) But maybe
>>>> this will help in future:
>>>> https://github.com/jprante/elasticsearch-plugin-deploy
>>>>
>>>> Best regards,
>>>> Radu
>>>> --
>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>
>>>> On Wed, Jan 7, 2015 at 5:15 AM, Ben Berg  wrote:
>>>>
>>>>> Hello,
>>>>> I am attempting to upgrade a 10 node cluster from 1.3.2 to 1.4.2 - I
>>>>> upgraded the first node,removed and reinstalled latest versions of plugins
>>>>> - two non-site plugins are river-twitter (version 2.4.1) and jdbc (version
>>>>> 1.4.0.8 and also tried 1.4.0.7) - and when starting the node I see the
>>>>> errors below in logs on upgraded node and node does not join cluster. If i
>>>>> downgrade to 1.3.2, uninstall plugins and reinstall jdbc river plugin
>>>>> version 1.3.0.4 the node properly joins the cluster.
>>>>> Errors from logs:
>>>>> [2015-01-06 21:29:39,970][INFO ][node ] [bi-es1]
>>>>> version[1.4.2], pid[5910], build[927caff/2014-12-16T14:11:12Z]
>>>>> [2015-01-06 21:29:39,971][INFO ][node ] [bi-es1]
>>>>> initializing ...
>>>>> [2015-01-06 21:29:40,009][INFO ][plugins  ] [bi-es1]
>>>>> loaded [river-twitter, jdbc-1.4.0.7-a875ced], sites [head, kopf, bigdesk,
>>>>> paramedic, HQ, whatson]
>>>>> [2015-01-06 21:29:45,587][INFO ][node ] [bi-es1]
>>>>> initialized
>>>>> [2015-01-06 21:29:45,588][INFO ][node ] [bi-es1]
>>>>> starting ...
>>>>> [2015-01-06 21:29:45,960][INFO 

Re: Upgraded node unable to join cluster while attempting cluster upgrade from 1.3.2 to 1.4.2

2015-01-09 Thread Radu Gheorghe
Thanks Ben and Joerg. Too bad there was no good solution this particular
upgrade, at least we all learned something (well, aside from Joerg who knew
it all from the start :D).

Best regards,
Radu

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Jan 9, 2015 at 12:28 AM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> Hi Radu,
>
> the deploy plugin is somewhat limited to "restartable" plugins as it can
> only restart services of a plugin, i.e. lifecycle components with carefully
> implemented doStart()/doStop() methods.
>
> Most plugins come with modules that add REST endpoints, actions, parsers,
> functions etc. as modules by the onModule...() mechanism, so they get
> "baked" into the ES node as immutable objects before the node is going up.
> This means, the deploy plugin can not change or undo this behavior. (Maybe
> some evil hackery will do to a certain degree)
>
> For my use case, I embed a Ratpack server and start my own HTTP web server
> that does not use baked-in ES modules, just a restartable service. It is
> really delightful to just use a single curl PUT command for rapid
> prototyping a Ratpack web app embedded in ES without node restart.
>
> A restartable plugin is:
>
> https://github.com/jprante/elasticsearch-plugin-ratpack
>
> Jörg
>
>
>
> On Thu, Jan 8, 2015 at 5:24 PM, Radu Gheorghe 
> wrote:
>
>> Hello Ben,
>>
>> Maybe it works if you uninstall the plugin from one node at a time and do
>> a rolling restart (sticking to 1.3.2), then do the upgrade with another
>> rolling restart, then install the plugin back again with yet another
>> rolling restart?
>>
>> I would understand if you said "no way I do 3 restarts!" :) But maybe
>> this will help in future:
>> https://github.com/jprante/elasticsearch-plugin-deploy
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Wed, Jan 7, 2015 at 5:15 AM, Ben Berg  wrote:
>>
>>> Hello,
>>> I am attempting to upgrade a 10 node cluster from 1.3.2 to 1.4.2 - I
>>> upgraded the first node,removed and reinstalled latest versions of plugins
>>> - two non-site plugins are river-twitter (version 2.4.1) and jdbc (version
>>> 1.4.0.8 and also tried 1.4.0.7) - and when starting the node I see the
>>> errors below in logs on upgraded node and node does not join cluster. If i
>>> downgrade to 1.3.2, uninstall plugins and reinstall jdbc river plugin
>>> version 1.3.0.4 the node properly joins the cluster.
>>> Errors from logs:
>>> [2015-01-06 21:29:39,970][INFO ][node ] [bi-es1]
>>> version[1.4.2], pid[5910], build[927caff/2014-12-16T14:11:12Z]
>>> [2015-01-06 21:29:39,971][INFO ][node ] [bi-es1]
>>> initializing ...
>>> [2015-01-06 21:29:40,009][INFO ][plugins  ] [bi-es1]
>>> loaded [river-twitter, jdbc-1.4.0.7-a875ced], sites [head, kopf, bigdesk,
>>> paramedic, HQ, whatson]
>>> [2015-01-06 21:29:45,587][INFO ][node ] [bi-es1]
>>> initialized
>>> [2015-01-06 21:29:45,588][INFO ][node ] [bi-es1]
>>> starting ...
>>> [2015-01-06 21:29:45,960][INFO ][transport] [bi-es1]
>>> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
>>> 192.168.83.231:9300]}
>>> [2015-01-06 21:29:45,982][INFO ][discovery] [bi-es1]
>>> bi-cluster1/VyTzSKWnQAS9ZCk971fJEg
>>> [2015-01-06 21:29:51,906][WARN ][transport.netty  ] [bi-es1]
>>> Message not fully read (request) for [28805963] and action
>>> [discovery/zen/join/validate], resetting
>>> [2015-01-06 21:29:51,915][INFO ][discovery.zen] [bi-es1]
>>> failed to send join request to master
>>> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
>>> reason 
>>> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
>>> nested: 
>>> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
>>> nested: ElasticsearchIllegalArgumentException[No custom index metadata
>>> factory registered for type [rivers]]; ]
>>> [2015-01-06 21:29:57,028][WARN ][transport.netty  ] [bi-es1]
>>> Message not fully read (request) for [28807127] and action
>>> [discovery/zen/join/validate], resetting
>>> [2015-01-06 21

Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Radu Gheorghe
Hi Paresh,

You're welcome. I'm this
<http://stuffmysisterswilllike.files.wordpress.com/2012/07/victory-kid.jpg>
glad I nailed it!

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Jan 9, 2015 at 9:25 AM, Paresh Behede  wrote:

> Thank you so much Rodu...solution worked for me...
>
> Regards,
> Paresh B.
>
> On Thursday, 8 January 2015 21:11:47 UTC+5:30, Radu Gheorghe wrote:
>>
>> Thanks, David! I had no idea it works until... about one hour ago :)
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Thu, Jan 8, 2015 at 4:01 PM, David Pilato  wrote:
>>
>>> Very nice Radu. I love this trick. :)
>>>
>>> --
>>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
>>> <http://Elasticsearch.com>*
>>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
>>> <https://twitter.com/elasticsearchfr> | @scrutmydocs
>>> <https://twitter.com/scrutmydocs>
>>>
>>>
>>>
>>> Le 8 janv. 2015 à 14:43, Radu Gheorghe  a écrit
>>> :
>>>
>>> Hi Paresh,
>>>
>>> If you want to sort on the field, I think it has to be the same type. So
>>> if you make everything a double, it should work for all numeric fields. To
>>> do that, you can use dynamic templates
>>> <http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/mapping-root-object-type.html#_dynamic_templates>.
>>> For example if you have this:
>>>
>>>   "mappings" : {
>>> "_default_" : {
>>>"dynamic_templates" : [ {
>>>  "long_to_float" : {
>>>"match" : "*",
>>>"match_mapping_type" : "long",
>>>"mapping" : {
>>>  "type" : "float"
>>>}
>>>  }
>>>} ]
>>>  }
>>>   }
>>>
>>> And add a new field with value=32, the field would be mapped as float
>>> instead of long.
>>>
>>> Best regards,
>>> Radu
>>> --
>>> Performance Monitoring * Log Analytics * Search Analytics
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>> On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have requirement of storing document in elastic search which will
>>>> have dynamic fields + those fields could have different data types 
>>>> values...
>>>>
>>>> For e.g.,
>>>> Document 1 could have age field with value = 32, so when I would insert
>>>> 1st document in ES my index mapping will get created and age will be mapped
>>>> to Integer/Long
>>>>
>>>> Now if I get age = 32.5 in another document ES will throw me exception
>>>> of data type mismatch...
>>>>
>>>> Can you suggest what can I do to handle such scenario?
>>>>
>>>> As workaround we are creating different fields for different data types
>>>> like age.long / age.double but this also won't work if I have to do sorting
>>>> over age field...
>>>>
>>>> Kindly suggest...
>>>>
>>>> Thanks in advance,
>>>> Paresh Behede
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiv

Re: Upgraded node unable to join cluster while attempting cluster upgrade from 1.3.2 to 1.4.2

2015-01-08 Thread Radu Gheorghe
Hello Ben,

Maybe it works if you uninstall the plugin from one node at a time and do a
rolling restart (sticking to 1.3.2), then do the upgrade with another
rolling restart, then install the plugin back again with yet another
rolling restart?

I would understand if you said "no way I do 3 restarts!" :) But maybe this
will help in future: https://github.com/jprante/elasticsearch-plugin-deploy

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jan 7, 2015 at 5:15 AM, Ben Berg  wrote:

> Hello,
> I am attempting to upgrade a 10 node cluster from 1.3.2 to 1.4.2 - I
> upgraded the first node,removed and reinstalled latest versions of plugins
> - two non-site plugins are river-twitter (version 2.4.1) and jdbc (version
> 1.4.0.8 and also tried 1.4.0.7) - and when starting the node I see the
> errors below in logs on upgraded node and node does not join cluster. If i
> downgrade to 1.3.2, uninstall plugins and reinstall jdbc river plugin
> version 1.3.0.4 the node properly joins the cluster.
> Errors from logs:
> [2015-01-06 21:29:39,970][INFO ][node ] [bi-es1]
> version[1.4.2], pid[5910], build[927caff/2014-12-16T14:11:12Z]
> [2015-01-06 21:29:39,971][INFO ][node ] [bi-es1]
> initializing ...
> [2015-01-06 21:29:40,009][INFO ][plugins  ] [bi-es1]
> loaded [river-twitter, jdbc-1.4.0.7-a875ced], sites [head, kopf, bigdesk,
> paramedic, HQ, whatson]
> [2015-01-06 21:29:45,587][INFO ][node ] [bi-es1]
> initialized
> [2015-01-06 21:29:45,588][INFO ][node ] [bi-es1]
> starting ...
> [2015-01-06 21:29:45,960][INFO ][transport] [bi-es1]
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 192.168.83.231:9300]}
> [2015-01-06 21:29:45,982][INFO ][discovery] [bi-es1]
> bi-cluster1/VyTzSKWnQAS9ZCk971fJEg
> [2015-01-06 21:29:51,906][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28805963] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:29:51,915][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:29:57,028][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28807127] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:29:57,036][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:30:02,245][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28808254] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:30:02,252][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:30:07,576][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28809377] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:30:07,583][INFO ][discovery.zen] [bi-es1]
> failed to send join request to master
> [[bi-es4][-cczOZfKTw-duMdUO_YSAw][bi-es4][inet[/192.168.83.234:9300]]{master=true}],
> reason 
> [RemoteTransportException[[bi-es4][inet[/192.168.83.234:9300]][discovery/zen/join]];
> nested: 
> RemoteTransportException[[bi-es1][inet[/192.168.83.231:9300]][discovery/zen/join/validate]];
> nested: ElasticsearchIllegalArgumentException[No custom index metadata
> factory registered for type [rivers]]; ]
> [2015-01-06 21:30:12,689][WARN ][transport.netty  ] [bi-es1]
> Message not fully read (request) for [28810050] and action
> [discovery/zen/join/validate], resetting
> [2015-01-06 21:30:12,696][INFO ][discovery.zen] [bi-es1]
> failed to send join request 

Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Radu Gheorghe
Thanks, David! I had no idea it works until... about one hour ago :)

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 4:01 PM, David Pilato  wrote:

> Very nice Radu. I love this trick. :)
>
> --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
> <http://Elasticsearch.com>*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
> <https://twitter.com/elasticsearchfr> | @scrutmydocs
> <https://twitter.com/scrutmydocs>
>
>
>
> Le 8 janv. 2015 à 14:43, Radu Gheorghe  a
> écrit :
>
> Hi Paresh,
>
> If you want to sort on the field, I think it has to be the same type. So
> if you make everything a double, it should work for all numeric fields. To
> do that, you can use dynamic templates
> <http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/mapping-root-object-type.html#_dynamic_templates>.
> For example if you have this:
>
>   "mappings" : {
> "_default_" : {
>"dynamic_templates" : [ {
>  "long_to_float" : {
>"match" : "*",
>"match_mapping_type" : "long",
>"mapping" : {
>  "type" : "float"
>}
>  }
>} ]
>  }
>   }
>
> And add a new field with value=32, the field would be mapped as float
> instead of long.
>
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
> On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede 
> wrote:
>
>> Hi,
>>
>> I have requirement of storing document in elastic search which will have
>> dynamic fields + those fields could have different data types values...
>>
>> For e.g.,
>> Document 1 could have age field with value = 32, so when I would insert
>> 1st document in ES my index mapping will get created and age will be mapped
>> to Integer/Long
>>
>> Now if I get age = 32.5 in another document ES will throw me exception of
>> data type mismatch...
>>
>> Can you suggest what can I do to handle such scenario?
>>
>> As workaround we are creating different fields for different data types
>> like age.long / age.double but this also won't work if I have to do sorting
>> over age field...
>>
>> Kindly suggest...
>>
>> Thanks in advance,
>> Paresh Behede
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_%2BPsCYxCHu7_ErZVw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_%2BPsCYxCHu7_ErZVw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/E0768DFA-EF17-46F2-B488-5EC29A60E37D%40pilato.fr
> <https://groups.google.com/d/msgid/elasticsearch/E0768DFA-EF17-46F2-B488-5EC29A60E37D%40pilato.fr?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1cOrR%3D_bndAjg-5CmL8q1AdJESCpj5NPtqxYeXUTscDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread Radu Gheorghe
OK, now it makes sense. 5 requests with 320 shards might saturate your
queue.

But 320 shards sounds like a lot for one index. I assume you don't need to
scale that very index to 320 nodes (+ replicas). If you can get the number
of shards down (say, to the default of 5) things will surely look better
not only from the queue's perspective, but it should also improve search
performance.

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 3:46 PM, vipins  wrote:

> Sorry , I was wrong with number of shards. actual number of shards is 320
> for
> the index which i am querying.
>
> We are using rolling indices on a daily basis.
>
> max queue size is 1000 for search thread pool.
>
> We overcome the issue None of the configured nodes are available by keeping
> tcp connection alive as true.
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702p4068711.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1420724779413-4068711.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_00wG%2BNUQQm2_KtH7jKBC7ovN1AXnAf9Jot2VCTppMk9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I store 2 different data types in same field of 2 different document?

2015-01-08 Thread Radu Gheorghe
Hi Paresh,

If you want to sort on the field, I think it has to be the same type. So if
you make everything a double, it should work for all numeric fields. To do
that, you can use dynamic templates
.
For example if you have this:

  "mappings" : {
"_default_" : {
   "dynamic_templates" : [ {
 "long_to_float" : {
   "match" : "*",
   "match_mapping_type" : "long",
   "mapping" : {
 "type" : "float"
   }
 }
   } ]
 }
  }

And add a new field with value=32, the field would be mapped as float
instead of long.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 11:14 AM, Paresh Behede  wrote:

> Hi,
>
> I have requirement of storing document in elastic search which will have
> dynamic fields + those fields could have different data types values...
>
> For e.g.,
> Document 1 could have age field with value = 32, so when I would insert
> 1st document in ES my index mapping will get created and age will be mapped
> to Integer/Long
>
> Now if I get age = 32.5 in another document ES will throw me exception of
> data type mismatch...
>
> Can you suggest what can I do to handle such scenario?
>
> As workaround we are creating different fields for different data types
> like age.long / age.double but this also won't work if I have to do sorting
> over age field...
>
> Kindly suggest...
>
> Thanks in advance,
> Paresh Behede
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ec663bd5-cf3b-4a3f-8828-03c4c53d3837%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_09uEGnDtJegPyZ-FY%2BUeCzDs_N1_%2BPsCYxCHu7_ErZVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread Radu Gheorghe
You're welcome.

So you're saying you're running 5 searches on a single index with 5 shards
(25 per-shard queries in total) and you're getting an error? I assume that
error doesn't say the queue is full because the queue is 1000. Can you post
the full error and also a gist where you reproduce the issue
? I might be missing an essential bit
here.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 3:14 PM, vipins  wrote:

> Thanks a lot for your detailed response.
> We have got all default settings only.Single node and 5 shards. But there
> are lot of indices with huge number of records.
> search settings:
>   "threads" : 12,
>   "queuesize" : 1000,
>
> My query is very simple. which runs on a single index only.
>
> Even with 5 requests in between it is throwing None of the configured nodes
> are available.
>
> Thanks,
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702p4068707.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1420722888084-4068707.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0DMNN91Xo_iQpFFdep%3DosynTs5tGr-UDX_EOCOX%3DTg5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding node architecture initial setup

2015-01-08 Thread Radu Gheorghe
Hello Phani,

Usually the dedicated masters are much smaller than the data nodes, because
they have much less work to do. If the 4 nodes you're talking about are
equal, it might be inefficient to add a 5th so you can have 2 data and 3
master nodes. Maybe for the same budget of adding the 5th you can add 3
small master nodes and keep the 4 as data nodes. Then you'd have
minimum_master_nodes=2. This is the ideal case IMO.

If you don't have lots of data, you can have a setup with 3 nodes: all
master-eligible, only two of them hold data and the 3rd would be a
dedicated master that would act like a tie breaker. The downside is when
both data nodes are super-busy that you won't have a cluster (the tie
breaker won't be able to get a quorum). But then again, if both your data
nodes are unresponsive, having a working cluster has little value.

You can extend this setup with your 4 nodes: all master-eligible, 3 data
and one dedicated master. You'll have to increase minimum_master_nodes to 3
(otherwise you can have a split-brain). Your cluster will still tolerate
one node going down as in the previous case, but you'll have more capacity.
This might be the best bet (capacity vs safety) if you absolutely have to
stick with the 4 servers.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 11:53 AM,  wrote:

> Hi All
>
>  I have chosen to establish 4 nodes in my cluster. I read concept of
> dedicated master nodes and only data holding nodes in elastic search.please
> explain me briefly how can i establish cluster by using the above four
> nodes.
>
>suppose if i have chosen N/2+1 for 4 nodes the minimum no of master
> nodes would be 3 so one node left in my cluster. master nodes only managing
> and indexing data to other nodes i.e data nodes. to implement replica do we
> need 5 nodes because i left with only 1 data node where i can keep replica?
>
>  other wise will the primary shard resides on any one of master node
> and replica will be hold my data node or please explain me how to design
> above scenarios with my four nodes in cluster.
>
> Thanks
>
> phani
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ba85ef3c-afc8-45b9-b1b7-e8dbdd32313c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0WMSjCycEsje0BYMm-oZXNTg_MQLe24RcqyFThXC7RbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: concurrent search request to elasticsearch

2015-01-08 Thread Radu Gheorghe
Hello,

The search threadpool size (that is, how many requests can be actually
worked on at once) defaults to 3 times the number of processors. This might
be reduced in future, though, see:
https://github.com/elasticsearch/elasticsearch/pull/9165

The queue size (how many requests ES can accept before starting to reject
them) defaults to 1000.

>From what I understand, this is per thread, so the answer to your question
depends on how many processors you have and how many shards get hit by each
search. For example, if a search runs on 3 indices, each with 2 shards
(number of replicas is irrelevant, because the search will only hit one
complete set of data) you'll get 6 requests in the threadpool per search.
If you have two servers with 8 cores each, you 8*3*2=48 threads available.
So the cluster can work on 8 requests at once. On top of that it can still
queue round-down(2 nodes * 1000 queue size/6 requests per search)=333
searches until it starts rejecting them.

Regarding your scaling question, I can't give you a direct answer,
unfortunately, because it depends on a whole lot of variables, mainly how
your data, queries and hardware look like and what can be changed. The fact
that your threadpool queue got full is just a symptom, it's not clear to me
what happens in there. I usually see this when there are lots of indices
and/or those indices have lots of shards. So a single request takes a lot
of requests in the threadpool, filling it up, even if the ES cluster can
keep up with the load. If that's your case increase the threadpool queue
size and make sure you don't have too many shards per index.

If your cluster can't keep up with the load (a monitoring tool like SPM
 should show you that), then the first step is to
see where is the bottleneck. Again, monitoring can give some insight: are
queries too expensive, can they be optimized? do you have too many cache
evictions? is the heap size too large or too small? is memory, I/O or CPU
the bottleneck? Things like that. It could also be that you need
more/different hardware.

Finally, you can make scaling ES someone else's problem by using a hosted
service like Logsene . Especially
if you're using ES for log- or metric-like data, you'll get lots of
features out of the box, and we expose most of the ES API to plug in your
custom stuff.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Jan 8, 2015 at 1:32 PM, vipins  wrote:

> What is the maximum limit on the concurrent search requests with default
> Elastic search server settings.
>
> I am able to perform only 5 parallel search requests in my application with
> default settings.
>
> how can we improve the scalability of ES server search requests apart from
> increasing number of node,shards and queue size in search thread pool.
>
> thanks.
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/concurrent-search-request-to-elasticsearch-tp4068702.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1420716748150-4068702.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2ZoQ6bewJ_aSxFMxAO0_5Mdsu%3D9WA4m_be7AfSb%3D2%2BTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rejected execution exception for multiple parallel requests.

2015-01-07 Thread Radu Gheorghe
Hello,

I assume you query lots of shards/indices? If not, then it might just be
that ES is overloaded with that many requests and you have to add nodes.

If yes, you'll can increase the queue size of the search thread pool.
Something like:

curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"threadpool.search.size" : 5000
}
}'

Note that ES will use more memory to keep the search requests until they
get executed.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Tue, Jan 6, 2015 at 8:44 AM, vipins  wrote:

> Hi,
>
> I am querying elasticsearch for multiple parallel requests using single
> transport client instance in my application.
>
> I got the below exception for the parallel execution. Hot to overcome the
> issue.
>
> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
> rejected execution (queue capacity 1000) on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@5f804c60
> at
>
> org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at
>
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
> at
>
> org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:509)
> at
>
> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:441)
> at
>
> org.elasticsearch.action.search.type.TransportSearchScanAction$AsyncAction.sendExecuteFirstPhase(TransportSearchScanAction.java:68)
> at
>
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:171)
> at
>
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:153)
> at
>
> org.elasticsearch.action.search.type.TransportSearchScanAction.doExecute(TransportSearchScanAction.java:52)
> at
>
> org.elasticsearch.action.search.type.TransportSearchScanAction.doExecute(TransportSearchScanAction.java:42)
> at
>
> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
> at
>
> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:107)
> at
>
> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
> at
>
> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
> at
>
> org.elasticsearch.action.search.TransportSearchAction$TransportHandler.messageReceived(TransportSearchAction.java:124)
> at
>
> org.elasticsearch.action.search.TransportSearchAction$TransportHandler.messageReceived(TransportSearchAction.java:113)
> at
>
> org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:212)
> at
>
> org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
> at
>
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at
>
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at
>
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at
>
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at
>
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
> at
>
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
> at
>
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
> at
>
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at
>
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at
>
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at
>
> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
> at
>
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.jav

Re: Regex + simple word match

2015-01-07 Thread Radu Gheorghe
Hi Amit,

You'll probably need to use a multi field

(one with standard analyzer, one with keyword analyzer). This should return
the string on:

message:"string"

and

message.raw:"this.*string"

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Jan 6, 2015 at 10:44 AM, Amit  wrote:
> The default analyzer is standard. If I change it to keyword I can get
regex
> working. But I want both to work simultaneously.
> For ex, Lets say I push this event to elasticsearch via logstash "this is
my
> new string".
> In kibana search,
>  If I look for message:"string", it should return me "this is my new
string"
>  If I look for message:"this.*string", it should return me "this is my new
> string"
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/elasticsearch/319ff9fa-af2b-481f-8124-824dab9df91b%40googlegroups.com
.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1qaAWH08Q%2B36FYqmNJp_1A50MG3_-D8hPZMCRyeB08TA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling Restart an Elasticsearch Cluster with Ansible

2014-09-17 Thread Radu Gheorghe
Hello Lance,

This looks really nice and useful, thanks for sharing!

When yous end a cluster health command, you can also tell it to 
wait_for_status=green or wait_for_status=yellow with a specified timeout:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-health.html#request-params

This seems a nicer approach than to repeat the request and verify the 
outcome from Ansible. But then again, if you just restarted a node, you 
don't know when the node responds to requests at all. So waiting for green 
might work on the last step, but maybe not when you restart each node (to 
wait for yellow, since you disabled shard allocation), unless the init 
script ensures ES is responsive when it returns successful.

Best regards,
Radu

On Wednesday, September 17, 2014 10:43:10 PM UTC+3, Lance A. Brown wrote:
>
> I've come up with what I think is a safe way to rolling restart an 
> Elasticsearch cluster using Ansible handlers. 
>
> Why is this needed? 
>
> Even if you use a serial setting to limit the number of nodes processed 
> at one time, Ansible will restart elasticsearch nodes and continue 
> processing as soon as the elasticsearch service restart reports itself 
> complete.  This pushes the cluster into a red state due to muliple data 
> nodes being restarted at once, and can cause performance problems. 
>
> Solution: 
>
> Perform a rolling restart of each elasticsearch node and wait for the 
> cluster to stabilize before continuing processing nodes.  This set of 
> chained handlers restarts each node while keeping the cluster from 
> thrashing on reallocating shards during the process. 
>
> A gist of the handlers/main.yml file for my Elasticsearch role is at 
> https://gist.github.com/labrown/5341ebec47bfba6dd7d4 
>
> I welcome any comments and/or suggestions. 
>
> --[Lance] 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eadcf01e-212b-4a5c-8b2d-6171d8085ebd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Any clues about transport connection issues on AWS HVM instances?

2014-06-19 Thread Radu Gheorghe
Hi Elasticsearch list :)

I'm having some trouble while running Elasticsearch on r3.large (HVM
virtualization) instances in AWS. The short story is that, as soon as I put
any significant load on them, some requests take a very long time (for
example, Indices Stats) and I see disconnected/timeout errors in the logs.
Did anyone else experience similar things or has any ideas of another
solution than avoiding HVM instances?

More detailed symptoms:
- if there's very little load on them (say, 2GB of data on each node, few
queries and indexing operations) all is well
- by "significant load", I mean some 10GB of data, a few queries per
minute, 100 docs indexed per second (4K per doc, <10 fields). By no means
"overload", CPU rarely tops 20%, no significant GC, nothing suspicious in
any of the metrics SPM  collects. The only clue
is that, for the time the problem appears, we get heartbeat alerts because
requests to the stats APIs take too long
- by "some requests take very long time", I mean that some queries take
miliseconds (as I would expect them), and some take 10 minutes or so.
Eventually succeeding (at least this was the case for the manual requests
I've sent)
- sometimes, nodes get temporarily dropped from the cluster, but then
things quickly come back to green. However, sometimes shards were stuck
while relocating

Things I've tried:
- different ES versions and machine sizes: the same problem seems to appear
on 0.90.7 with r3.xlarge instances, I'm on 1.1.1 with r3.large
- teared down all machines and launched other ones and redeployed. Same
thing
- different JVM (1.7) versions: Oracle u25, u45, u55, u60, OpenJDK u51.
Same thing everywhere
- spawned the same number of machines with m3.large (same specs as
r3.large, except for half of the RAM, paravirtual instead of HVM). The
problem magically went away with the same data and load

Here are some Node Disconnected exceptions:
[2014-06-18 13:05:35,058][WARN ][search.action] [es01] Failed
to send release search context
org.elasticsearch.transport.NodeDisconnectedException:
[es02][inet[/10.140.1.84:9300]][search/freeContext] disconnected
[2014-06-18 13:05:35,058][DEBUG][action.admin.indices.stats] [es01]
[83f0223f-4222-4a57-a918-ff424924f002_2014-05-20][1],
node[oOlO-iewR3qnAuQkT28vfw], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@3339f285]
org.elasticsearch.transport.NodeDisconnectedException:
[es02][inet[/10.140.1.84:9300]][indices/stats/s] disconnected

I've enabled TRACE logging on both transport and discovery and all I see is
connection timeouts and exceptions, like:

07:29:19,039][TRACE][transport.netty ] [es01] close connection exception
caught on transport layer [[id: 0x190d8444]], disconnecting from relevant
node

Or, more verbose:

[2014-06-16 07:29:19,060][TRACE][transport.netty  ] [es01] connect
exception caught on transport layer [[id: 0x6816c0fe]]
org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection
timed out: es03/10.171.39.244:9300
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2014-06-16 07:29:19,060][TRACE][discovery.zen.ping.unicast] [es01] [1]
failed to connect to [#zen_unicast_7#][es01][inet[es04/10.79.155.249:9300]]
org.elasticsearch.transport.ConnectTransportException: [][inet[es04/
10.79.155.249:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:683)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:643)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:610)
at
org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:133)
at
org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:279)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException:
connection timed out: es03/10.171.39.244:9300
at
org.elasticsearch.common.netty.channel.socke

Re: Partial word match with singular and plurals: Elasticsearch

2014-05-02 Thread Radu Gheorghe
t;my_stemmer_analyzer"
>>}
>> }
>>  }
>>   }
>>}
>>
>> }'
>>
>> *Available documents:*
>> 1. men’s shaver
>> 2. men’s shavers
>> 3. men’s foil shaver
>> 4. men’s foils shaver
>> 5. men’s foil shavers
>> 6. men’s foils shavers
>> 7.men's foil advanced shaver
>> 8.norelco men's foil advanced shaver
>>
>> *Query:*
>> curl -XPOST "http://localhost:9200/my_improved_index/my_improved_
>> index_type/_search" -d'
>> {
>>"size": 30,
>>"query": {
>>   "bool": {
>>  "should": [
>> {
>>"match": {
>>   "name.untouched": {
>>  "query": "men\"s shaver",
>>  "operator": "and",
>>  "type": "phrase",
>>  "boost": "10"
>>   }
>>}
>> },
>> {
>>"match_phrase": {
>>   "name.name_stemmer": {
>>  "query": "men\"s shaver",
>>  "slop": 5
>>   }
>>}
>> }
>>  ]
>>   }
>>}
>> }'
>>
>> *Returned result:*
>> 1. men's shaver --> correct
>> 2. men's shavers --> correct
>> 3. men's foils shaver --> NOT correct
>> 4. norelco men's foil advanced shaver --> NOT correct
>> 5. men's foil advanced shaver --> NOT correct
>> 6. men's foil shaver --> NOT correct.
>>
>> *Expected result:*
>> 1. men's shaver --> exact phrase match
>> 2. men's shavers --> ZERO word distance + 1 plural
>> 3. men's foil shaver --> 1 word distance
>> 4. men's foils shaver --> 1 word distance + 1 plural
>> 5. men's foil advanced shaver --> 2 word distance
>> 4. norelco men's foil advanced shaver --> 2 word distance
>>
>> Why higher distance document scored higher?
>> Is there any problem with stemmer or nGram settings?
>>
>>
>> On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:
>>>
>>> Hi Kruti,
>>>
>>> The short answer is yes, it is possible. Here's one way to do it:
>>>
>>> Have the fields you search on as multi 
>>> field<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html>,
>>> where you index them with various settings, like once not-analyzed for
>>> exact matches, once with ngrams to account for typoes and so on. You can
>>> query all those sub-fields, and use the multi-match query with best
>>> fields<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fields>or
>>>  the DisMax
>>> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html>to
>>>  wrap all those queries and take the best score (or the best score and a
>>> factor of the other scores by using the tie breaker).
>>>
>>> Now, for the specific requirements you have:
>>> 1. For exact matching, you can skip analysis altogether, and set "index"
>>> to "not_anyzed". Alternatively, you could use the simple 
>>> analyzer<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer>
>>>  or
>>> something equally "harmless" to allow for some error. You could boost this
>>> kind of query a lot, so that exact matches come out on top
>>> 2. For phrase matches with distance, you can use the match_phrase type
>>> of the match 
>>> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase>.
>>> You can configure a *slop* that defines the maximum allowed distance
>>> for a match to show up in your results. Documents with "closer" words
>>> should get higher scores. You would boost this query less than the exact
>>> matches, but more than the following.
>>> 3. For handling plurals, you'd probably need to do some stemming. Have a
>>> look at the snowball token 
>>> filter<http://www.elastics

Re: Read/Write consistency

2014-05-01 Thread Radu Gheorghe
Hi Mohit,

I think the transaction log takes care of that, because there's a copy on
all instances of the same shard, and they need to be in sync.

Best regards,
Radu

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, May 1, 2014 at 9:57 PM, Mohit Anchlia wrote:

> What's not clear is how does elasticsearch identify what pieces of data is
> missing between the primary and the replica?
>
> On Wed, Apr 30, 2014 at 3:27 AM, Radu Gheorghe  > wrote:
>
>> Hi Mohit,
>>
>> I'll answer inline.
>>
>> On Mon, Apr 28, 2014 at 4:57 PM, Mohit Anchlia wrote:
>>>
>>>> Trying to understand the following scenarios of consistency in
>>>> elasticsearch:
>>>>
>>>> 1) sync replication - How does elasticsearch deals with consistency
>>>> issue that may arise from 1 node momentarily going down and missing writes
>>>> to it?
>>>>
>>>
>> This depends on the write consistency setting. By default, the operation
>> only succeeds if a quorum of replicas can index the document:
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency
>>
>>
>>>  When the node comes backup and the reads going to the non-primary
>>>> shards could get inconsistent data?
>>>>
>>>
>> No, when the node comes back up it will sync the stuff it missed with the
>> other nodes.
>>
>>
>>> 2) async replication - What happens if replication is slow for some
>>>> reason, could users see inconsistent data?
>>>>
>>>
>> Yes, if you hit a shard that didn't get the latest operation, it could
>> see an "old" version of the data. You can use "preference" to try and hit
>> the primary shard all the time, but then your replicas will just be sitting
>> there for redundancy:
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html
>>
>>
>>> 3) sync/async replication - how does elasticsearch keep data in sync for
>>>> those writes that never happened on the non-primary shard because of
>>>> network/node failures?
>>>>
>>>
>> It either uses the transaction log or it transfers the whole shard to
>> that node.
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3aJ4qZt47uyjqs0gd6L1Fz0EhLrV_L7jzSFAYOEvz1Nw%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3aJ4qZt47uyjqs0gd6L1Fz0EhLrV_L7jzSFAYOEvz1Nw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpdBxiXZgDw5HXdeRPr5oJtnwHTwHNFr2_UoJYobPqzxw%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpdBxiXZgDw5HXdeRPr5oJtnwHTwHNFr2_UoJYobPqzxw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3FAEvQGjMWDqSCT6biYJGiMNGSUDJ80QvT1cJXnqtNJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Partial word match with singular and plurals: Elasticsearch

2014-05-01 Thread Radu Gheorghe
Hi Kruti,

The short answer is yes, it is possible. Here's one way to do it:

Have the fields you search on as multi
field,
where you index them with various settings, like once not-analyzed for
exact matches, once with ngrams to account for typoes and so on. You can
query all those sub-fields, and use the multi-match query with best
fieldsor
the DisMax
queryto
wrap all those queries and take the best score (or the best score and
a
factor of the other scores by using the tie breaker).

Now, for the specific requirements you have:
1. For exact matching, you can skip analysis altogether, and set "index" to
"not_anyzed". Alternatively, you could use the simple
analyzer
or
something equally "harmless" to allow for some error. You could boost this
kind of query a lot, so that exact matches come out on top
2. For phrase matches with distance, you can use the match_phrase type of
the match 
query.
You can configure a *slop* that defines the maximum allowed distance for a
match to show up in your results. Documents with "closer" words should get
higher scores. You would boost this query less than the exact matches, but
more than the following.
3. For handling plurals, you'd probably need to do some stemming. Have a
look at the snowball token
filteror
the stemmer
token 
filter.
Again, this would be boosted lower than 1) and 2), but more than 4)
4. For handling substrings, you can use ngrams, as you already seem to be
doing. Alternatively, you can pay the price at query time by using the
"fuziness" option of the match query.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla wrote:

> *My final goal is to have following search precedence:*
> 1. Exact phrase match
> 2. Exact word match with incremental distance
> 3. Plurals
> 4. Substring
>
> *Suppose I have following documents:*
> i. men’s shaver
> ii. men’s shavers
> iii. men’s foil shaver
> iv. men’s foils shaver
> v. men’s foil shavers
> vi. men’s foils shavers
>
> *Case 1: *search for : “men’s foil shaver”
> *Expected result:*
> 1. men’s foil shaver <-- exact phrase match
> 2. men’s foil shavers <-- exact word match on 2 of 3 words with 0
> word distance + plural
> 3. men’s foils shaver <-- exact word match on 2 of 3 words with 1
> word distance + plural
> 4. men’s foils shavers <-- exact word match on 1 of 3 words + 2
> plurals
> 5. men’s shaver <-- exact word match on 2 of 3 words (66% match)
> 6. men’s shavers <-- exact word match on 1 of 3 words + plural (66%
> match)
>
> *Case 2: *search for : “men’s foil shavers”
> *Expected result:*
> 1. men’s foil shavers <-- exact phrase match
> 2. men’s foil shaver <-- exact word match on 2 of 3 words with 0 word
> distance + singular
> 3. men’s foils shavers <-- exact word match on 2 of 3 words with 1
> word distance + singular
> 4. men’s foils shaver <-- exact word match on 1 of 3 words + 2
> singulars
> 5. men’s shavers <-- exact word match on 2 of 3 words (66% match)
> 6. men’s shaver <-- exact word match on 1 of 3 words + singular (66%
> match)
>
>
> *Case 3:* search for : “men’s foils shavers”
> *Expected result:*
> 1. men’s foils shavers <-- exact phrase match
> 2. men’s foils shaver <-- exact word match on 2 of 3 words with 0
> word distance + singular
> 3. men’s foil shavers <-- exact word match on 2 of 3 words with 1
> word distance + singular
> 4. men’s foil shaver <-- exact word match on 1 of 3 words + 2
> singulars
> 5. men’s shavers <-- exact word match on 2 of 3 words (66% match)
> 6. men’s shaver <-- exact word match on 1 of 3 words + singular (66%
> match)
>
>
> Is there any way in elasticsearch I can achieve this?
> This question is related to my other question which is not answered yet.
> Link to my other question "
> https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/ui9OR7JARs4/Mp3oOtTqY0EJ
> ".
>
> Any suggestion would help!
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receivi

Re: Read/Write consistency

2014-04-30 Thread Radu Gheorghe
Hi Mohit,

I'll answer inline.

On Mon, Apr 28, 2014 at 4:57 PM, Mohit Anchlia wrote:
>
>> Trying to understand the following scenarios of consistency in
>> elasticsearch:
>>
>> 1) sync replication - How does elasticsearch deals with consistency issue
>> that may arise from 1 node momentarily going down and missing writes to it?
>>
>
This depends on the write consistency setting. By default, the operation
only succeeds if a quorum of replicas can index the document:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency


> When the node comes backup and the reads going to the non-primary shards
>> could get inconsistent data?
>>
>
No, when the node comes back up it will sync the stuff it missed with the
other nodes.


> 2) async replication - What happens if replication is slow for some
>> reason, could users see inconsistent data?
>>
>
Yes, if you hit a shard that didn't get the latest operation, it could see
an "old" version of the data. You can use "preference" to try and hit the
primary shard all the time, but then your replicas will just be sitting
there for redundancy:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html


> 3) sync/async replication - how does elasticsearch keep data in sync for
>> those writes that never happened on the non-primary shard because of
>> network/node failures?
>>
>
It either uses the transaction log or it transfers the whole shard to that
node.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3aJ4qZt47uyjqs0gd6L1Fz0EhLrV_L7jzSFAYOEvz1Nw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: index binary files

2014-04-30 Thread Radu Gheorghe
Hello,

Normally, you would send indexing requests to the REST API with the stuff
you want Elasticsearch to index:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html

If you want Elasticsearch to automatically fetch files from the file system
for you, have a look at David's FileSystem River:
https://github.com/dadoonet/fsriver

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Apr 29, 2014 at 6:40 PM, anass benjelloun wrote:

> hello,
>
> I installed ElasticSearch, its work good i can index and search xml and
> json content using Dev HTTP Client.
> I need your help to index binary files in elasticsearch then search for
> them by content.
> I added mapper-attachements to elastic search but what i dont know is how
> to specify the folder of pdf or docx files to index it. something like
> base64 or i dont know.
> Thanks for helping me.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/787f6815-408a-4ef7-bfd3-a5ee6cc02798%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2UQpB63eye_Yii0KiGYXiMj8Q6v3swRrxxYNk5jiMxpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: performance issue with script scoring with fields having a large array

2014-04-30 Thread Radu Gheorghe
Hello,

Using _source for scripts is typically slow, because ES has to go to each
stored document and extract fields from there. A faster approach is to use
something like doc['field3'].values[12], which will used the field data
cache (already loaded in memory, at least after the first run):
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields

More details about field data can be found here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.htm

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Apr 30, 2014 at 12:27 PM, NM  wrote:

> I have document having fields containing  large array.
>
> I would like to score according to the value of a nth element of such
> array, but got very slow answer (5s) for only 10K document indexed.
>
> my mapping:
> document {
> id: value,
> field2: string,
> field3: [ int_1,int_2, ... , int_10k] <- large array of 10K integers
> }
>
> assume I generated and indexed 10K documents with 1K random integer values
> in the field 'field3'
>
> I then use the following search query
>
> GET /test/document/_search
> {
>   "query":{
>"function_score":{
>   "script_score" : {
> "script" : " _source.fields3[12] * _source.fields3[11] "
> }
>
> => got 5000 ms
>
> however with basic Java object with a simple nested loop:
>
> - for all the documents
>   score[i] =  doc[i].fields[12] * doc[i].fields[11]
> - sort by score
>
> => got < 50 ms
>
> ES is 100 slower than a simple loop..
>
> How to get similar performance with ES?
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2wmDJFBJvJ1fTUsszaP7GjVtJYfSU-AbHMq6NS%2BVqhFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Default index analyzer in elasticsearch

2014-04-29 Thread Radu Gheorghe
Hello Piyush,

You could set the default mapping for all strings to be not_analyzed by
using dynamic templates:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates

Alternatively, you can set the default analyzer to "keyword":
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html#default-analyzers

Best regards,
Radu

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Apr 29, 2014 at 5:06 PM, Piyush Rai  wrote:

> I am facing a problem with elasticsearch where I dont want my indexed term
> to be analyzed. But the elasticsearch has some default setting which is
> tokenizing it on space. Therefore my facet query is not returning the
> result I want.
>
> I read that "index" : "not_analyzed" in properties of index type should
> work. But the problem is that I dont know my document structure before
> hand. I would be indexing random MySQL databases to elasticsearch without
> knowing the table structure.
>
> How can I setup elasticsearch such that by default it uses "index" :
> "not_analyzed" until otherwise asked for. Thanks
>
> PS: I am using java if I can directly use any API for it I would love it.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5a6d9c36-ab27-491f-a7f5-9fc90f3fe802%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2pVZ5QYVT_XqYprCJHa0tCVUGydDW03D4p9nowNHwJRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Scaling elastic server] how much load can elastic search handle?

2014-04-29 Thread Radu Gheorghe
Hello Abrar,

The answer to your questions depends a lot on how your data and queries
look like, how often you run them and how often new data is indexed. You
could paste those details here, but I don't think anyone could give you a
definite answer, maybe more of a guesstimate based on experience with
similar patterns.

The best way to find out is to install some performance monitoring tool
(there are many out there, you can find one by clicking the link in my
signature) and start running tests with production-like data and queries.
And then you'll see how much your machine can handle and where the
bottlenecks are.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Apr 29, 2014 at 4:02 PM, Abrar Sheikh  wrote:

> Hi,
>
> I have a single aws EC2 large instance with 7.5 GB ram and 100 GB
> harddrive dual core 2.6 GHz. My elastic instance on a average has around
> 10,000,000 records. I use somewhat complex queries. I am calling elastic
> apis from my PHP code which is exposed as a rest service(needed to do some
> post processing of data).  my question is how much load can my server
> handle and at what point do i shift to a multi node architecture. What
> effect does # of shards and replication have on performance. With my
> current system configuration how many queries per second(qps) can my
> elastic search handle?
>
> Thanks and Regards,
> Abrar.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/49f1dbbe-42e6-445e-ba6e-b9d358d8908f%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0TiMyRc-9qcPTFWYT6eHYWQc_PWd96RbFy6eNk_aAc_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: deleting old records from ES

2014-04-29 Thread Radu Gheorghe
Hi Abrar,

When you run delete by query, documents aren't immediately deleted. They
are marked as deleted and the data gets removed in the process of merging,
which runs asynchronously in the background. There are some settings that
make merging more "sensitive" to deletes, have a look here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html

Alternatively, you can force a merge by running optimize:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-optimize.html#indices-optimize

But this is also problematic because it will thrash your I/O and invalidate
much of your caches.

Usually, a better option if you have expiring data is to have time-based
indices (an index per day, for example). You can then use something like
the Elasticsearch Curator to expire entire indices as the days go by:
https://github.com/elasticsearch/curator

Here's a nice blog post about it:
http://www.elasticsearch.org/blog/curator-tending-your-time-series-indices/

Using time-based indices is typically better because removing entire
indices is super-fast (really remove the associated files and little
metadata) and doesn't mess up with the rest of your data and caches.

Best regards,
Radu

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Apr 29, 2014 at 4:04 PM, Abrar Sheikh  wrote:

> Hi,
>
> I would what to know how to rotate old records from ES index, coz when i
> delete documents with delete query API it does not free up disk space. Eg.
> i want to remove records older than 3 days in my index. For me disk space
> is an issue.
>
> Thanks and Regards,
> Abrar.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/61d35912-72ad-4139-aec3-46fc89f36ac9%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3VZzmoJ5J8GSh88J%2BXN%2BOGyPh-bwg7W5r6QokKOMsdoQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch rpm and configuring garbage collection

2014-04-28 Thread Radu Gheorghe
Hi,

The setting is indices.fielddata.cache.size. You can check out the docs for
more options, like adjusting the circuit breaker:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

To change the GC settings, I usually edit the in.sh script. You have an
interesting point, that it might be overridden by a RPM upgrade. I'm not
aware of a way to override them, maybe somebody else is.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Apr 28, 2014 at 10:22 AM, Jilles van Gurp
wrote:

> Sounds like that could be the cause. What setting would I need to
> configure for this? Regardless, I'd like to know where to start with
> configuring garbage collection for ES.
>
> Jilles
>
>
> On Monday, April 28, 2014 8:58:03 AM UTC+2, Radu Gheorghe wrote:
>
>> Hi Jilles,
>>
>> Any idea on why you're running out of memory? You can monitor stuff like
>> field, filter caches and memory pools to get some clues.
>>
>> I would assume your problem is because field data is accumulating, and
>> not because of GC settings. Depending on how much heap, how many nodes you
>> have, and how much heap is used for other things, I'd limit that a slice of
>> the total memory (for example, 30%).
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Fri, Apr 25, 2014 at 7:09 PM, Jilles van Gurp wrote:
>>
>>> I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup
>>> and I've been wondering about the recommended way to configure it given
>>> that it deploys an init.d script with defaults.
>>>
>>> I figured out that I can use /etc/sysconfig/elasticsearch for things
>>> like heap size. However, /usr/share/elasticsearch/bin/elasticsearc
>>> h.in.sh configures some defaults for garbage collection:
>>>
>>> JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
>>> JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"
>>>
>>> JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>> JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>
>>> So, I'm getting some default configuration for garbage collection that I
>>> probably should be tuning; especially given that it is running out of
>>> memory after a few weeks on our setup with kibana and a rather large amount
>>> of logstash indices (over 200GB).
>>>
>>> Is it possible to have a custom garbage collection strategy without
>>> modifying files deployed and overwritten by the rpm? 
>>> elasticsearch.in.shseems specific to the 1.1.1 version given that it also 
>>> includes the
>>> classpath definition.
>>>
>>> In any case, it might be handy to clarify the recommended way to
>>> configure elasticsearch when deployed using the rpm as opposed to a
>>> developer machine with a tar ball. Most documentation I'm finding seems to
>>> assume the latter.
>>>
>>> Jilles
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8cb48df2-779f-4954-bd12-8677be19075a%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/8cb48df2-779f-4954-bd12-8677be19075a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3ex3uw5A2o6XCHbf75KirG4yGoHawbt2Com05G1Cfj4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch rpm and configuring garbage collection

2014-04-27 Thread Radu Gheorghe
Hi Jilles,

Any idea on why you're running out of memory? You can monitor stuff like
field, filter caches and memory pools to get some clues.

I would assume your problem is because field data is accumulating, and not
because of GC settings. Depending on how much heap, how many nodes you
have, and how much heap is used for other things, I'd limit that a slice of
the total memory (for example, 30%).

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Apr 25, 2014 at 7:09 PM, Jilles van Gurp wrote:

> I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup and
> I've been wondering about the recommended way to configure it given that it
> deploys an init.d script with defaults.
>
> I figured out that I can use /etc/sysconfig/elasticsearch for things like
> heap size. However, 
> /usr/share/elasticsearch/bin/elasticsearch.in.shconfigures some defaults for 
> garbage collection:
>
> JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
> JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"
>
> JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> So, I'm getting some default configuration for garbage collection that I
> probably should be tuning; especially given that it is running out of
> memory after a few weeks on our setup with kibana and a rather large amount
> of logstash indices (over 200GB).
>
> Is it possible to have a custom garbage collection strategy without
> modifying files deployed and overwritten by the rpm? elasticsearch.in.shseems 
> specific to the 1.1.1 version given that it also includes the
> classpath definition.
>
> In any case, it might be handy to clarify the recommended way to configure
> elasticsearch when deployed using the rpm as opposed to a developer machine
> with a tar ball. Most documentation I'm finding seems to assume the latter.
>
> Jilles
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0h10irKcsRU%3DPDu5DrVPACnSq01sZestFYrwftqze%3DAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using the aggregation Framework using a large set of doc IDs as query? ( + bypassing the scoring part)

2014-04-25 Thread Radu Gheorghe
Hi,

Yes, that's what I said, but I didn't know you wanted to use ES to enrich
the results only.

Plus, since you have the IDs already, you might use a huge multi-get
request:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-multi-get.html<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-multi-get.html#docs-multi-get>

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Apr 24, 2014 at 6:25 PM, NM  wrote:

> thanks Radu,
>
> to be sure to understand:
>
> I have a query  from a user, run a process A returning a list of IDs
> specific of the query and would like to use ES to enrich these ID with
> aggregated info  coming from the related (and already indexed) documents
>
> so the list of IDs from the results are a prior unknown /  depends on the
> query of the user.
>
> to use only the aggregation framework, you propose then for each query, to
> first index the results of the process A (list of ID) as a lookup document.
> and then after query ES using a term filter  + lookup mechanism.
>
> Is that right?
>
>
> Le jeudi 24 avril 2014 13:40:22 UTC+2, Radu Gheorghe a écrit :
>>
>> Hello,
>>
>> One way to do it would be to store all those IDs in an Elasticsearch
>> document. Then, you can use the terms filter with the terms lookup
>> mechanism to have ES fetch all the terms for you:
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism
>>
>> As you can see there, you have quite a lot of options for caching.
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Thu, Apr 24, 2014 at 12:19 PM, NM  wrote:
>>
>>> Hi guys,
>>>
>>> I use my own framework and get already the top N results  from a
>>> previous processing.
>>>
>>> I would like to use the aggregation framework of ES to use facets & co
>>> features  on such results.
>>>
>>> I previously indexed my documents in ES.
>>>
>>> What ES query should I do to avoid the scoring process and process only
>>> the aggregation facets and co features using the IDs a set of documents as
>>> query *knowing that N could be large (N = 1K)*?
>>>
>>> JAVA API
>>>
>>>
>>> Thanks
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/0b6b4e52-11fe-4914-b1bb-ed4b69421c08%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/0b6b4e52-11fe-4914-b1bb-ed4b69421c08%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>>   --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/913a7d06-ec51-4d4d-9d7a-e95750015c4b%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/913a7d06-ec51-4d4d-9d7a-e95750015c4b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0mroJemTpy4wygCOzBLsBXc1hWBYMdVLJC%3D0vinpmCdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Spaces in terms in request body make the query return no results

2014-04-25 Thread Radu Gheorghe
Great! You're welcome

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Apr 25, 2014 at 10:30 AM, Alexey Kotlyarov wrote:

> I think that isn't a valid query. I've just tried it and if I put "bla" in
>> there I still get the result back. Basically, it will run a match_all
>> query. It's like doing this:
>>
>> curl http://localhost:9200/twitter/tweet/_search?bla
>>
>> If you want to do an URI search, you need to put things in the "q"
>> parameter: http://www.elasticsearch.org/guide/en/elasticsearch/reference/
>> current/search-uri-request.html
>>
>> But not that a URI search will run a query_string query, which is
>> analyzed: http://www.elasticsearch.org/guide/en/elasticsearch/reference/
>> current/query-dsl-query-string-query.html
>>
>
> I have tested this; you're right, the query submitted in this way is
> equivalent to a "match_all". Thank you for the explanations!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/79c28ade-2830-481b-8841-be9d55cc9022%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1cix_eY7L2RY-1wz6Hs6g8Jm06d6d1tf9QsoBcE%3Dg8Tg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Use facets to group documents by index and type

2014-04-24 Thread Radu Gheorghe
Great! You're welcome!


On Thu, Apr 24, 2014 at 3:26 PM, Sviatoslav Abakumov <
abakumov.sviatos...@progforce.com> wrote:

> Magnificent! That's what I like to see.
>
> Thank you so much!
>
>
> On Thu, Apr 24, 2014 at 4:23 PM, Radu Gheorghe  > wrote:
>
>> Ah, I see now. Sorry for misreading your initial post. You can do that
>> using scripts. This works for me:
>>
>> curl localhost:9200/_search?pretty -d '{
>> "facets": {
>>   "idxtype": {
>> "terms": {
>>   "script_field": "doc['"'_index'].value + '/' + doc['_type'"'].value"
>> }
>>   }
>> }}'
>>
>> Using doc['field_name'].value is usually preferred to _source or _fields
>> because with doc you're accessing values loaded in memory:
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Thu, Apr 24, 2014 at 2:54 PM, Sviatoslav Abakumov <
>> abakumov.sviatos...@progforce.com> wrote:
>>
>>> Oh, yes, I've enabled `_index` so I can do term facet requests like:
>>>
>>> {'field': '_index'}
>>> {'field': '_type'}
>>>
>>> And even:
>>>
>>> {'fields': ['_index', '_type']}
>>>
>>> There was a hope that this request does what I want, but it merely
>>> unites facets:
>>>
>>> {'_type': 'terms',
>>>  'missing': 0,
>>>  'other': 0,
>>>  'terms': [{'count': 198, 'term': 'comment'},
>>>{'count': 99, 'term': 'light_beer'},
>>>{'count': 99, 'term': 'dark_beer'}]}
>>>
>>> Well, I want to concatenate index and type names and pass it to ES
>>> facets. The only way I see is to use `script`, but I don't know how to
>>> access `_index` and `_type` from there.
>>>
>>> Thank you for understanding.
>>>
>>>
>>> On Thu, Apr 24, 2014 at 3:30 PM, Radu Gheorghe <
>>> radu.gheor...@sematext.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm not sure if you're already aware of the predefined _index and _type
>>>> fields:
>>>>
>>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-index-field.html
>>>>
>>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-type-field.html
>>>>
>>>> You can enable them so you can search and retrieve values like you
>>>> would with any other field.
>>>>
>>>> Best regards,
>>>> Radu
>>>>
>>>>
>>>>  On Thu, Apr 24, 2014 at 11:28 AM, Sviatoslav Abakumov <
>>>> abakumov.sviatos...@progforce.com> wrote:
>>>>
>>>>>  Hello,
>>>>>
>>>>> I'd like to get how much documents of particular index AND type, like
>>>>> this:
>>>>>
>>>>> {'f4': {'_type': 'terms',
>>>>> 'missing': 0,
>>>>> 'other': 0,
>>>>> 'terms': [{'count': 99, 'term': 'light_beer/comment'},
>>>>>   {'count': 99, 'term': 'dark_beer/comment'}],
>>>>> 'total': 198}}
>>>>>
>>>>> To do this I added fields `__index` and `__type` and filled them when
>>>>> I was indexing documents.
>>>>> I use term facet with `"script": "_source.__index + '/' +
>>>>> _source.__type"`
>>>>>
>>>>> Is there a way to get index and type names from `_index` (IndexLookup)or 
>>>>> `_source` (SourceLookup)?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubsc

Re: Use facets to group documents by index and type

2014-04-24 Thread Radu Gheorghe
Ah, I see now. Sorry for misreading your initial post. You can do that
using scripts. This works for me:

curl localhost:9200/_search?pretty -d '{
"facets": {
  "idxtype": {
"terms": {
  "script_field": "doc['"'_index'].value + '/' + doc['_type'"'].value"
}
  }
}}'

Using doc['field_name'].value is usually preferred to _source or _fields
because with doc you're accessing values loaded in memory:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields

Best regards,
Radu
-- 
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Apr 24, 2014 at 2:54 PM, Sviatoslav Abakumov <
abakumov.sviatos...@progforce.com> wrote:

> Oh, yes, I've enabled `_index` so I can do term facet requests like:
>
> {'field': '_index'}
> {'field': '_type'}
>
> And even:
>
> {'fields': ['_index', '_type']}
>
> There was a hope that this request does what I want, but it merely unites
> facets:
>
> {'_type': 'terms',
>  'missing': 0,
>  'other': 0,
>  'terms': [{'count': 198, 'term': 'comment'},
>{'count': 99, 'term': 'light_beer'},
>{'count': 99, 'term': 'dark_beer'}]}
>
> Well, I want to concatenate index and type names and pass it to ES facets.
> The only way I see is to use `script`, but I don't know how to access
> `_index` and `_type` from there.
>
> Thank you for understanding.
>
>
> On Thu, Apr 24, 2014 at 3:30 PM, Radu Gheorghe  > wrote:
>
>> Hello,
>>
>> I'm not sure if you're already aware of the predefined _index and _type
>> fields:
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-index-field.html
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-type-field.html
>>
>> You can enable them so you can search and retrieve values like you would
>> with any other field.
>>
>> Best regards,
>> Radu
>>
>>
>>  On Thu, Apr 24, 2014 at 11:28 AM, Sviatoslav Abakumov <
>> abakumov.sviatos...@progforce.com> wrote:
>>
>>>  Hello,
>>>
>>> I'd like to get how much documents of particular index AND type, like
>>> this:
>>>
>>> {'f4': {'_type': 'terms',
>>> 'missing': 0,
>>> 'other': 0,
>>> 'terms': [{'count': 99, 'term': 'light_beer/comment'},
>>>   {'count': 99, 'term': 'dark_beer/comment'}],
>>> 'total': 198}}
>>>
>>> To do this I added fields `__index` and `__type` and filled them when I
>>> was indexing documents.
>>> I use term facet with `"script": "_source.__index + '/' +
>>> _source.__type"`
>>>
>>> Is there a way to get index and type names from `_index` (IndexLookup)or 
>>> `_source` (SourceLookup)?
>>>
>>> Thank you.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/80604448-0bd5-499f-9c13-de096a732981%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/80604448-0bd5-499f-9c13-de096a732981%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/uSUHj5LcGdw/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googleg

Re: Using the aggregation Framework using a large set of doc IDs as query? ( + bypassing the scoring part)

2014-04-24 Thread Radu Gheorghe
Hello,

One way to do it would be to store all those IDs in an Elasticsearch
document. Then, you can use the terms filter with the terms lookup
mechanism to have ES fetch all the terms for you:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism

As you can see there, you have quite a lot of options for caching.

Best regards,
Radu
-- 
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Apr 24, 2014 at 12:19 PM, NM  wrote:

> Hi guys,
>
> I use my own framework and get already the top N results  from a previous
> processing.
>
> I would like to use the aggregation framework of ES to use facets & co
> features  on such results.
>
> I previously indexed my documents in ES.
>
> What ES query should I do to avoid the scoring process and process only
> the aggregation facets and co features using the IDs a set of documents as
> query *knowing that N could be large (N = 1K)*?
>
> JAVA API
>
>
> Thanks
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0b6b4e52-11fe-4914-b1bb-ed4b69421c08%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3cZynh5WmcWFEH0oQFzLWNg-b600Wbu%3DEsOyFiPZBwkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: High cpu usage after 0.9.13 to 1.0.3

2014-04-24 Thread Radu Gheorghe
Hello,

What happens if you start ES without any plugin? Do you get the same high
CPU load?

If not, then you can try enabling one plugin at a time and see which one is
the cause. Also, make sure that your plugins match your version. For
example, you seem to need the mongodb river to be version 1.7.4 to work
with 1.0.0.

Best regards,
Radu


On Thu, Apr 24, 2014 at 6:15 AM, Maziyar Panahi
wrote:

> I thought this might help:
>
>
> 
>
>
> 
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2f11960f-84b1-4b07-aa4a-4c1cda543e99%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0xw0N%2B_buvFMM0t-zhFwrRUb8ssjH5kG7nAQhfmoOAJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Use facets to group documents by index and type

2014-04-24 Thread Radu Gheorghe
Hello,

I'm not sure if you're already aware of the predefined _index and _type
fields:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-index-field.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-type-field.html

You can enable them so you can search and retrieve values like you would
with any other field.

Best regards,
Radu


On Thu, Apr 24, 2014 at 11:28 AM, Sviatoslav Abakumov <
abakumov.sviatos...@progforce.com> wrote:

> Hello,
>
> I'd like to get how much documents of particular index AND type, like this:
>
> {'f4': {'_type': 'terms',
> 'missing': 0,
> 'other': 0,
> 'terms': [{'count': 99, 'term': 'light_beer/comment'},
>   {'count': 99, 'term': 'dark_beer/comment'}],
> 'total': 198}}
>
> To do this I added fields `__index` and `__type` and filled them when I
> was indexing documents.
> I use term facet with `"script": "_source.__index + '/' + _source.__type"`
>
> Is there a way to get index and type names from `_index` (IndexLookup) or
> `_source` (SourceLookup)?
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/80604448-0bd5-499f-9c13-de096a732981%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0z5jD8cZfkca8FNOmcg7FnEeXL_S%3DZmJVcDm7PArixLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Best way to store variable number of key:value fields

2014-04-24 Thread Radu Gheorghe
You're welcome :)

On Thu, Apr 24, 2014 at 10:16 AM, Dominic Gross  wrote:

> Thats awesome, thank you for your help!
>
> Am Donnerstag, 24. April 2014 09:10:10 UTC+2 schrieb Radu Gheorghe:
>>
>> You can do that, too, yes.
>>
>>
>> On Thu, Apr 24, 2014 at 10:08 AM, Dominic Gross wrote:
>>
>>> So I can just write the two Documents (Apple and Mobilephone) to the
>>> same Index (and the same Type?).
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1qyz4VsC8uLFwf9yAgkZs4cYYZFMxPzk7JPGm5hKEWwA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Best way to store variable number of key:value fields

2014-04-24 Thread Radu Gheorghe
You can do that, too, yes.


On Thu, Apr 24, 2014 at 10:08 AM, Dominic Gross  wrote:

> So I can just write the two Documents (Apple and Mobilephone) to the same
> Index (and the same Type?).
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d4446198-a1b1-4d72-876f-711df91aa5b9%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_12sbbqWyGzu0EXCpsXOBrdXMA72KQucA4GWoxJKN%2BQxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Spaces in terms in request body make the query return no results

2014-04-24 Thread Radu Gheorghe
On Thu, Apr 24, 2014 at 9:57 AM, Alexey Kotlyarov wrote:

>
>
>> Your message field is analyzed by default using the Standard Analyzer:
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/
>> current/analysis-standard-analyzer.html
>>
>> This means your "test message" will become ["test", "message"].
>>
>> On the other hand, the prefix query isn't analyzed. Which means "test"
>> will match but "test " won't, because you have no term that begins with
>> that string.
>>
>> One solution for this is to index your message field as not_analyzed.
>> This will only generate the term "test message" which will match both
>> "test" and "test " prefixes. However, if you search for the "test" term, it
>> won't match because you have no such term.
>>
>> You can have the best of both worlds by indexing the same text multiple
>> times with multiple settings: http://www.elasticsearch.org/guide/en/
>> elasticsearch/reference/current/_multi_fields.html
>>
>
> Why does the query work if given in the request query string then?
>

I think that isn't a valid query. I've just tried it and if I put "bla" in
there I still get the result back. Basically, it will run a match_all
query. It's like doing this:

curl http://localhost:9200/twitter/tweet/_search?bla

If you want to do an URI search, you need to put things in the "q"
parameter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-uri-request.html

But not that a URI search will run a query_string query, which is analyzed:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2bi33sFpjtTXB7BDQZM-bm78YP96SOrRuJnwD5oJOm8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Best way to store variable number of key:value fields

2014-04-23 Thread Radu Gheorghe
Hi Dominic,

It's not very clear to me what your question is. You want to reject Apple
documents that have an OS field?

I guess you can either have Apples and Mobilephones in their own type. This
will make it clear which fields belong to which type of document. And
because ES has dynamic mappings enabled by default, you can always add new
types of documents, or new fields to a current type:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-mapper.html#_dynamic_mappings

Best regards
Radu
-- 
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Apr 24, 2014 at 9:52 AM, Dominic Gross  wrote:

> Hi,
> I want to index some Data, where each Document should have a variable
> number of fields.
> An example because I can't express myself very good this morning:
>
> Apple
>   Color: Green
>   Flavor: Sour
>
> Mobilephone
>   Color: black
>   Os: Android
>
> My problem is, that "Apple" should not have the field Os and the
> Mobilephone shouldnt have the field Flavor in turn.
>
> Im new to Elasticsearch, so maybe there is an easy answer for that.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/783ca00e-dbae-4ad2-955a-61c1af4ef080%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0DSe%2BmrzZfykt42%2BFQ39EZ_Rj-UOSQ7HaKKgCg9TPcNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Spaces in terms in request body make the query return no results

2014-04-23 Thread Radu Gheorghe
Hi Alexey,

Your message field is analyzed by default using the Standard Analyzer:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html

This means your "test message" will become ["test", "message"].

On the other hand, the prefix query isn't analyzed. Which means "test" will
match but "test " won't, because you have no term that begins with that
string.

One solution for this is to index your message field as not_analyzed. This
will only generate the term "test message" which will match both "test" and
"test " prefixes. However, if you search for the "test" term, it won't
match because you have no such term.

You can have the best of both worlds by indexing the same text multiple
times with multiple settings:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html

Best regards,
Radu
-- 
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Apr 24, 2014 at 9:42 AM, Alexey Kotlyarov wrote:

> Given a simple index:
>
> curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{"message":
> "test message"}'
>
> A query for "test" returns the tweet:
>
> curl -XPOST 'http://localhost:9200/twitter/tweet/_search?pretty' -d
> '{"query": {"prefix": {"message": "test"}}}'
> curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty' -d
> '{"query": {"prefix": {"message": "test"}}}'
>
> However, if I search for "test ", there are no results:
>
> curl -XPOST 'http://localhost:9200/twitter/tweet/_search?pretty' -d
> '{"query": {"prefix": {"message": "test "}}}'
> curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty' -d
> '{"query": {"prefix": {"message": "test "}}}'
>
> However again, the same query works fine if put into the URL (using wget
> and not curl because curl tries to expand the braces):
>
> wget -O - '
> http://localhost:9200/twitter/tweet/_search?{%22query%22:{%22prefix%22:{%22message%22:%22test%20%22}}}
> '
>
> How do I make the queries with "test " work when they are supplied in the
> request body?
>
> My Elasticsearch version is 1.1.1.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e17173c2-7e9d-429f-b36e-e897a695b56e%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1xAjxqGHTcT-7dQqEQRWoqLEXwkG5czbOUWNpasFpjdg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.