Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread Jan-Erik Westlund
Correct, in that case, it will not be a rolling upgrade ;-) The service
will be down for a few minutes.
Can I then change all the nodenames, and the start the services on all the
nodes with the new names without messing things up ?

2015-02-19 7:58 GMT+01:00 David Pilato :

> You should define this in that case:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after
>
> But it's not anymore a rolling upgrade, right? Your service will be down
> for some seconds/minutes I guess.
>
>
> David
>
> Le 19 févr. 2015 à 07:52, Jan-Erik Westlund  a
> écrit :
>
> I understand that, but is it safe to change all the nodenames and restart
> all the nodes at the same time ?
>
> Skickat från min iPhone 6.
>
> 19 feb 2015 kl. 07:47 skrev David Pilato :
>
> You can change safely elasticsearch.yml file while elasticsearch is
> running.
> This file is only loaded when elasticsearch starts.
>
> David
>
> Le 19 févr. 2015 à 07:33, Jan-Erik Westlund  a
> écrit :
>
> Hi again !
>
> Thanks for Rolling restart info, that was really helpful.
> But since the "elasticsearch.yml" file is managed by Puppet, all the
> nodenames will change pretty much at the same time !
> So in my case it would be best to shutdown the ES daemon on all nodes
> first, apply the Puppet changes and then start the ES cluster again...
> Is it safe to do so ?
>
> //Jan-Erik
>
> Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato:
>>
>> Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/cluster-nodes-shutdown.html#_rolling_
>> restart_of_nodes_full_cluster_restart
>>
>> --
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
>> *
>> @dadoonet  | @elasticsearchfr
>>  | @scrutmydocs
>> 
>>
>>
>>
>> Le 18 févr. 2015 à 16:37, Jan-Erik Westlund  a écrit
>> :
>>
>> Thanks David !
>>
>> All my "Recovery Throttling settings" are default in the
>> elasticsearch.yml file.
>> How do I disable allocation, in a running production environment ?
>> Do I need to disable allocation first, restart each node / daemon, and
>> after rename the nodes ?
>>
>> Or maybe it would be better to down the ES cluster (all 3 nodes) during a
>> maintenance windows, change all names, and then restart the ES cluster
>> nodes again ?
>>
>> //Jan-Erik
>>
>> Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
>>>
>>> Yes. It’s safe.
>>> You can do it one at a time.
>>>
>>> If you already have data around and don’t want your shards moving during
>>> this, you should disable allocation.
>>>
>>>
>>> --
>>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
>>> *
>>> @dadoonet  | @elasticsearchfr
>>>  | @scrutmydocs
>>> 
>>>
>>>
>>>
>>> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund  a
>>> écrit :
>>>
>>> Hi !
>>>
>>> Is it safe to change the node names of my 3 nodes in an existing
>>> elasticsearch 1.4.0 cluster ?
>>>
>>> The reason is to get rid of the random names like: Elizabeth "Betsy"
>>> Braddock, Franz Kafka, etc...
>>>
>>> Is it just to set the node.name: "server name" in elasticsearch.yml and
>>> then restart the daemon ?
>>> Do I do it one node at the time, or do I need down the cluster and then
>>> change all node names, and then bring up the cluster again ?
>>>
>>>
>>> //Jan-Erik
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%
>> 40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@google

Re: Aggregations failing on fields with custom analyzer..

2015-02-18 Thread Anil Karaka
http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

Posted in stack over flow as well..

On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>
> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
> that fields are case insensitive..
>
> I have my analyzer as below, 
>
> "index": {
> "analysis": {
> "analyzer": { // Custom Analyzer with keyword tokenizer and 
> lowercase filter, same as not_analyzed but case insensitive
> "case_insensitive_keyword_analyzer": {
> "tokenizer": "keyword",
> "filter": "lowercase"
> }
> }
> }
> }
>
> But when I'm trying to do term aggregation over a field with strings analyzed 
> as above, I'm getting this error..
>
> {
> "error" 
> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>  cannot be cast to 
> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
> "status" : 500
> }
>
> Are there additional settings that I have to update in my custom analyzer for 
> my terms aggregation to work..?
>
>
> The better question is I want a custom analyzer that does everything similar 
> to not_analyzed but is case insensitive.. How do I achieve that?
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregations failing on fields with custom analyzer..

2015-02-18 Thread Anil Karaka
I wanted a custom analyzer that behaves exactly like not_analyzed, except 
that fields are case insensitive..

I have my analyzer as below, 

"index": {
"analysis": {
"analyzer": { // Custom Analyzer with keyword tokenizer and 
lowercase filter, same as not_analyzed but case insensitive
"case_insensitive_keyword_analyzer": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}

But when I'm trying to do term aggregation over a field with strings analyzed 
as above, I'm getting this error..

{
"error" 
:"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
 cannot be cast to 
org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
"status" : 500
}

Are there additional settings that I have to update in my custom analyzer for 
my terms aggregation to work..?


The better question is I want a custom analyzer that does everything similar to 
not_analyzed but is case insensitive.. How do I achieve that?



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c657449-1279-4813-9e65-262cb81e114f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread David Pilato
You should define this in that case: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after

But it's not anymore a rolling upgrade, right? Your service will be down for 
some seconds/minutes I guess.


David

> Le 19 févr. 2015 à 07:52, Jan-Erik Westlund  a écrit :
> 
> I understand that, but is it safe to change all the nodenames and restart all 
> the nodes at the same time ?
> 
> Skickat från min iPhone 6.
> 
>> 19 feb 2015 kl. 07:47 skrev David Pilato :
>> 
>> You can change safely elasticsearch.yml file while elasticsearch is running.
>> This file is only loaded when elasticsearch starts.
>> 
>> David
>> 
>>> Le 19 févr. 2015 à 07:33, Jan-Erik Westlund  a écrit 
>>> :
>>> 
>>> Hi again !
>>> 
>>> Thanks for Rolling restart info, that was really helpful.
>>> But since the "elasticsearch.yml" file is managed by Puppet, all the 
>>> nodenames will change pretty much at the same time !
>>> So in my case it would be best to shutdown the ES daemon on all nodes 
>>> first, apply the Puppet changes and then start the ES cluster again...
>>> Is it safe to do so ?
>>> 
>>> //Jan-Erik
>>> 
 Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato:
 Have a look at 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet | @elasticsearchfr | @scrutmydocs
 
 
 
> Le 18 févr. 2015 à 16:37, Jan-Erik Westlund  a écrit :
> 
> Thanks David !
> 
> All my "Recovery Throttling settings" are default in the 
> elasticsearch.yml file.
> How do I disable allocation, in a running production environment ? 
> Do I need to disable allocation first, restart each node / daemon, and 
> after rename the nodes ?
> 
> Or maybe it would be better to down the ES cluster (all 3 nodes) during a 
> maintenance windows, change all names, and then restart the ES cluster 
> nodes again ?
> 
> //Jan-Erik
> 
>> Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
>> Yes. It’s safe.
>> You can do it one at a time.
>> 
>> If you already have data around and don’t want your shards moving during 
>> this, you should disable allocation.
>> 
>> 
>> -- 
>> David Pilato | Technical Advocate | Elasticsearch.com
>> @dadoonet | @elasticsearchfr | @scrutmydocs
>> 
>> 
>> 
>>> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund  a 
>>> écrit :
>>> 
>>> Hi !
>>> 
>>> Is it safe to change the node names of my 3 nodes in an existing 
>>> elasticsearch 1.4.0 cluster ?
>>> 
>>> The reason is to get rid of the random names like: Elizabeth "Betsy" 
>>> Braddock, Franz Kafka, etc...
>>> 
>>> Is it just to set the node.name: "server name" in elasticsearch.yml and 
>>> then restart the daemon ? 
>>> Do I do it one node at the time, or do I need down the cluster and then 
>>> change all node names, and then bring up the cluster again ?
>>> 
>>> 
>>> //Jan-Erik
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/9eca2130-030d-468c-825d-b66c8766ba4a%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/ZjOHjpXVZ00/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>>

Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread Jan-Erik Westlund
I understand that, but is it safe to change all the nodenames and restart all 
the nodes at the same time ?

Skickat från min iPhone 6.

> 19 feb 2015 kl. 07:47 skrev David Pilato :
> 
> You can change safely elasticsearch.yml file while elasticsearch is running.
> This file is only loaded when elasticsearch starts.
> 
> David
> 
>> Le 19 févr. 2015 à 07:33, Jan-Erik Westlund  a écrit :
>> 
>> Hi again !
>> 
>> Thanks for Rolling restart info, that was really helpful.
>> But since the "elasticsearch.yml" file is managed by Puppet, all the 
>> nodenames will change pretty much at the same time !
>> So in my case it would be best to shutdown the ES daemon on all nodes first, 
>> apply the Puppet changes and then start the ES cluster again...
>> Is it safe to do so ?
>> 
>> //Jan-Erik
>> 
>>> Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato:
>>> Have a look at 
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart
>>> 
>>> -- 
>>> David Pilato | Technical Advocate | Elasticsearch.com
>>> @dadoonet | @elasticsearchfr | @scrutmydocs
>>> 
>>> 
>>> 
 Le 18 févr. 2015 à 16:37, Jan-Erik Westlund  a écrit :
 
 Thanks David !
 
 All my "Recovery Throttling settings" are default in the elasticsearch.yml 
 file.
 How do I disable allocation, in a running production environment ? 
 Do I need to disable allocation first, restart each node / daemon, and 
 after rename the nodes ?
 
 Or maybe it would be better to down the ES cluster (all 3 nodes) during a 
 maintenance windows, change all names, and then restart the ES cluster 
 nodes again ?
 
 //Jan-Erik
 
> Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
> Yes. It’s safe.
> You can do it one at a time.
> 
> If you already have data around and don’t want your shards moving during 
> this, you should disable allocation.
> 
> 
> -- 
> David Pilato | Technical Advocate | Elasticsearch.com
> @dadoonet | @elasticsearchfr | @scrutmydocs
> 
> 
> 
>> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund  a écrit 
>> :
>> 
>> Hi !
>> 
>> Is it safe to change the node names of my 3 nodes in an existing 
>> elasticsearch 1.4.0 cluster ?
>> 
>> The reason is to get rid of the random names like: Elizabeth "Betsy" 
>> Braddock, Franz Kafka, etc...
>> 
>> Is it just to set the node.name: "server name" in elasticsearch.yml and 
>> then restart the daemon ? 
>> Do I do it one node at the time, or do I need down the cluster and then 
>> change all node names, and then bring up the cluster again ?
>> 
>> 
>> //Jan-Erik
>> 
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send 
>> an email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9eca2130-030d-468c-825d-b66c8766ba4a%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/ZjOHjpXVZ00/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/48CE0F31-5B2A-4F47-B210-3786C818C834%40pilato.fr.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...

Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread David Pilato
You can change safely elasticsearch.yml file while elasticsearch is running.
This file is only loaded when elasticsearch starts.

David

> Le 19 févr. 2015 à 07:33, Jan-Erik Westlund  a écrit :
> 
> Hi again !
> 
> Thanks for Rolling restart info, that was really helpful.
> But since the "elasticsearch.yml" file is managed by Puppet, all the 
> nodenames will change pretty much at the same time !
> So in my case it would be best to shutdown the ES daemon on all nodes first, 
> apply the Puppet changes and then start the ES cluster again...
> Is it safe to do so ?
> 
> //Jan-Erik
> 
>> Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato:
>> Have a look at 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart
>> 
>> -- 
>> David Pilato | Technical Advocate | Elasticsearch.com
>> @dadoonet | @elasticsearchfr | @scrutmydocs
>> 
>> 
>> 
>>> Le 18 févr. 2015 à 16:37, Jan-Erik Westlund  a écrit :
>>> 
>>> Thanks David !
>>> 
>>> All my "Recovery Throttling settings" are default in the elasticsearch.yml 
>>> file.
>>> How do I disable allocation, in a running production environment ? 
>>> Do I need to disable allocation first, restart each node / daemon, and 
>>> after rename the nodes ?
>>> 
>>> Or maybe it would be better to down the ES cluster (all 3 nodes) during a 
>>> maintenance windows, change all names, and then restart the ES cluster 
>>> nodes again ?
>>> 
>>> //Jan-Erik
>>> 
 Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
 Yes. It’s safe.
 You can do it one at a time.
 
 If you already have data around and don’t want your shards moving during 
 this, you should disable allocation.
 
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet | @elasticsearchfr | @scrutmydocs
 
 
 
> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund  a écrit :
> 
> Hi !
> 
> Is it safe to change the node names of my 3 nodes in an existing 
> elasticsearch 1.4.0 cluster ?
> 
> The reason is to get rid of the random names like: Elizabeth "Betsy" 
> Braddock, Franz Kafka, etc...
> 
> Is it just to set the node.name: "server name" in elasticsearch.yml and 
> then restart the daemon ? 
> Do I do it one node at the time, or do I need down the cluster and then 
> change all node names, and then bring up the cluster again ?
> 
> 
> //Jan-Erik
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/9eca2130-030d-468c-825d-b66c8766ba4a%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48CE0F31-5B2A-4F47-B210-3786C818C834%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: multiple words but same token

2015-02-18 Thread David Pilato
This will not produce exactly what your are looking for but I would use 
shingles: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html

And boost results found with the shingle analyzer than the standard analyzer So 
results containing "car A" will appear first.

David

> Le 18 févr. 2015 à 23:45, NM  a écrit :
> 
> Hi,
> 
> I would like to index 
> 
> "driving with car A is a bad thing"
> 
> knowing that I have special words composed of several words e.g. "car A" and 
> "car B".
> 
> if I use  standard appraoch the a or b are lost. ideally for "driving with 
> car A is a bad thing" the analyzer should return this:
> ( "driving", "car A", "bad", "thing") 
> 
> help, how will you do that? 
> 
> --- 
> my filters night well
> filter: {   
> protect_words:{
>   type: "keyword_marker",
>   keywords: "car a", "car b"]
> },
> protect_words2:{
>   type: "word_delimiter",
>   protected_words: ["car a", "car b"]
> }
>   },  
>   analyzer: {
>   
> custom_level2: {  
>  tokenizer: 'standard',
>   filter: ["asciifolding", "lowercase", "protect_words"],
>   type: 'custom'
> },
> custom_level1: {   
>   tokenizer: 'keyword',   
>   filter: ["asciifolding", "lowercase", "protect_words2"],
>   type: 'custom'
> },
> }
> } 
> 
> any help?
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/3aa288d7-19b4-4f49-b321-feccfd2b30a5%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/F696CA5D-600E-4FA7-BA8C-C869B21B2B40%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: ClassNotFoundException: org.elasticsearch.discovery.ec2.Ec2DiscoveryModule

2015-02-18 Thread David Pilato
Yes.

This should be added to the doc: 
https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/176

You need to add this dependency if you are using a NodeClient:

org.elasticsearch
elasticsearch-cloud-aws
2.4.1

HTH
David

> Le 19 févr. 2015 à 01:15, Diana Tuck  a écrit :
> 
> New to ES - Trying to use the elasticsearch-cloud-aws plugin, but when 
> starting my java client node, I'm getting ClassNotFoundException on 
> org.elasticsearch.discovery.ec2.Ec2DiscoveryModule.   Do I need to install 
> this plugin on java client nodes, and if so, how does one do that?  Or, 
> rather, is there a maven dependency that can be referenced to load these 
> required classes?
> 
> For reference, the elasticsearch.yaml is:
> 
> plugin.mandatory: cloud-aws
> cloud: 
>   aws: 
> access_key: ** 
> secret_key: * 
> discovery: 
>   type: ec2
> 
> and my java client code is:
> 
> Settings settings = ImmutableSettings.settingsBuilder()
> .put("node.name", nodeName)
> .put("cloud.aws.access_key", awsAccessKey)
> .put("cloud.aws.secret_key", awsSecretKey)
> .put("cloud.node.auto_attributes", true)
> .put("discovery.type", "ec2")
> .build();
> this.node = nodeBuilder()
> .clusterName(clusterName)
> .settings(settings)
> .client(true)
> .node();
> this.client = node.client();
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/925353fd-b717-417d-986f-570c634e39c1%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E6E5D8C5-3A9A-46A1-9531-95514F4708BB%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread Jan-Erik Westlund
Hi again !

Thanks for Rolling restart info, that was really helpful.
But since the "elasticsearch.yml" file is managed by Puppet, all the 
nodenames will change pretty much at the same time !
So in my case it would be best to shutdown the ES daemon on all nodes 
first, apply the Puppet changes and then start the ES cluster again...
Is it safe to do so ?

//Jan-Erik

Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato:
>
> Have a look at 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>  
> Le 18 févr. 2015 à 16:37, Jan-Erik Westlund  > a écrit :
>
> Thanks David !
>
> All my "Recovery Throttling settings" are default in the 
> elasticsearch.yml file.
> How do I disable allocation, in a running production environment ? 
> Do I need to disable allocation first, restart each node / daemon, and 
> after rename the nodes ?
>
> Or maybe it would be better to down the ES cluster (all 3 nodes) during a 
> maintenance windows, change all names, and then restart the ES cluster 
> nodes again ?
>
> //Jan-Erik
>
> Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
>>
>> Yes. It’s safe.
>> You can do it one at a time.
>>
>> If you already have data around and don’t want your shards moving during 
>> this, you should disable allocation.
>>
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> *
>> @dadoonet  | @elasticsearchfr 
>>  | @scrutmydocs 
>> 
>>
>>
>>  
>> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund  a écrit 
>> :
>>
>> Hi !
>>
>> Is it safe to change the node names of my 3 nodes in an existing 
>> elasticsearch 1.4.0 cluster ?
>>
>> The reason is to get rid of the random names like: Elizabeth "Betsy" 
>> Braddock, Franz Kafka, etc...
>>
>> Is it just to set the node.name: "server name" in elasticsearch.yml and 
>> then restart the daemon ? 
>> Do I do it one node at the time, or do I need down the cluster and then 
>> change all node names, and then bring up the cluster again ?
>>
>>
>> //Jan-Erik
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9eca2130-030d-468c-825d-b66c8766ba4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch search performance question

2015-02-18 Thread Jay Danielian
Just to update the thread.

I added code to disable cache on all the term filters we were using, and it 
made a huge performance improvement. Now we are able to service the queries 
with average response time under two seconds, which is excellent (we are 
bundling several searches using _msearch, so < 2 seconds total response is 
good) The search requests / sec metric is still peaking at around 600 / 
sec, however our CPU "only" spikes to about 65% now - so I think we can add 
more search threads to our config as we are no longer maxing out CPU. I 
also see a a bit of disk read activity now, which against our non RAID EBS 
drive - means we may be able to squeeze more if we switch disk setup.

It seems like having these filters add cache items was wasting CPU on cache 
eviction and cache lookups (cache misses really) for each query - which 
really only shows up when trying to push some load through.

Thanks for everyone's suggestions!!

J

On Friday, February 13, 2015 at 11:55:52 AM UTC-5, Jay Danielian wrote:
>
> Thanks to all for these great suggestions. I haven't had a chance to 
> change the syntax yet, as that is a risky thing for me to quickly change 
> against our production setup. My plan is to try that this weekend (so I can 
> properly test the new syntax is returning the same results). However, is 
> there a way to turn filter caching off globally via config or elsewhere?
>
> Thanks!
>
> J
>
> On Friday, February 13, 2015 at 11:25:20 AM UTC-5, Mark Harwood wrote:
>>
>> So I can see in the hot threads dump the initialization requests for 
>> those FixedBitSets I was talking about.
>> Looking at the number of docs in your index I estimate each Term to be 
>> allocating 140mb of memory in total for all these bitsets across all shards 
>> given the >1bn docs in your index. Remember that you are probably setting 
>> only a single bit in each of these large structures. 
>> Another stat (if I read it correctly) shows >5m evictions of these cached 
>> filters given their low reusability. It's fair to say you have some cache 
>> churn going on :)
>> Did you try my earlier suggestion of queries not filters?
>>
>>
>>
>>
>> On Friday, February 13, 2015 at 2:29:42 PM UTC, Jay Danielian wrote:
>>>
>>> As requested here is a dump of the hot threads output. 
>>>
>>> Thanks!
>>>
>>> J
>>>
>>> On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett 
>>> wrote:

 You might want to try hitting hot threads while putting your load on it 
 and seeing what you see.  Or posting it.

 Nik

 On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian <
 jay.da...@circleback.com> wrote:

> Mark,
>
> Thanks for the initial reply. Yes, your assumption about these things 
> being very specific and thus not likely to have any re-use with regards 
> to 
> caching is correct. I have attached some screenshots from the BigDesk 
> plugin which showed a decent snapshot of what the server looked like 
> while 
> my tests were running. You can see the spikes in CPU, that essentially 
> covered the duration when the JMeter tests were running. 
>
> At a high level, the only thing that seems to be really stressed on 
> the server is CPU. But that makes me think that there is something in my 
> setup , query syntax, or perhaps the cache eviction rate, etc that is 
> causing it to spike so high. I also have concerns about non RAID 0 the 
> EBS 
> volumes, as I know that having one large volume does not maximize 
> throughput - however, just looking at the stats  it doesn't seem like IO 
> is 
> really a bottleneck.
>
> Here is a sample query structure => 
> https://gist.github.com/jaydanielian/c2be885987f344031cfc
>
> Also this is one query - in reality we use _msearch to pipeline 
> several of these queries in one batch. The queries also include custom 
> routing / route key to make sure we only hit one shard.
>
> Thanks!
>
> J
>
>
> On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:
>>
>> It'd help if you could gist/pastebin/etc a query example.
>>
>> Also your current ES and java need updating, there are known issues 
>> with java <1.7u55, and you will always see performance boosts running 
>> the 
>> latest version of ES.
>>
>> That aside, what is your current resource utilisation like?  Are you 
>> seeing lots of cache evictions, high heap use, high CPU, IO delays?
>>
>> On 13 February 2015 at 07:32, Jay Danielian > > wrote:
>>
>>> I know this is difficult to answer, the real answer is always "It 
>>> Depends" :) But I am going to go ahead and hope I get some feedback 
>>> here.
>>>
>>> We are mainly using ES to issue terms searches against fields that 
>>> are non-analyzed. We are using ES like a key value store, where once 
>>> the 
>>> match is found we parse

Re: Elasticsearch performance tuning

2015-02-18 Thread Deva Raj
Hi   Mark Walkom,

Thanks mark and i miss anything to tuning performance of elasticsearch.

  Added the following to elasticsearch settings:
Java heap size : Half of physical memory
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_
threshold_ops: 5
indices.memory.index_buffer_size: 50%


On Thursday, February 19, 2015 at 7:25:27 AM UTC+5:30, Mark Walkom wrote:
>
> 1. It depends
> 2. It depends
> 3. It depends
> 4. It also depends.
>
> The performance of ES is dependent on you; your data, your use, your 
> queries, your hardware, your configuration. If that is the results you got 
> then it is indicative to your setup and thus is your benchmark, and from 
> there you can tweak and try to improve performance.
>
> Monitoring LS is a little harder as there are no APIs for it (yet). Most 
> of the performance of it will result on your filters (especially grok).
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54a9031b-1e73-42b7-92b9-7ae3bda46ee7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Efficient sorting by distance to a point

2015-02-18 Thread Nicholas Gledhill
As a final addition to the problem... here's an example of what a Feature 
looks like:

 {
"_index": "location-v1",
"_type": "Feature",
"_id": "mvbjukyzRSatqOsq6CgwvA",
"_score": 2324118,
"_source": {
   "name": "1075365",
   "geometry": {
  "type": "Polygon",
  "coordinates": [
 [
[
   151.1853804162,
   -33.8939659845
],
[
   151.185184928,
   -33.8933718015
],
[
   151.185956032,
   -33.893205394
],
[
   151.185920704,
   -33.8931036995
],
[
   151.18615344,
   -33.8930332145
],
[
   151.1861488642,
   -33.8930247415
],
[
   151.186198496,
   -33.8930146405
],
[
   151.186202304,
   -33.893022873
],
[
   151.1864536,
   -33.893595448
],
[
   151.1862055362,
   -33.893669337
],
[
   151.18616032,
   -33.8936747205
],
[
   151.186197248,
   -33.8937595245
],
[
   151.185987712,
   -33.893822332
],
[
   151.1853804162,
   -33.8939659845
]
 ]
  ]
   },
   "FeatureCollection": "MB_2011_NSW",
   "point": null,
   "predictive_text": "1075365",
   "owner": "admin",
   "properties": {
  "MB_2011_NSW.MB_CODE11": "1075365",
  "MB_2011_NSW.MB_CAT11": "Residential",
  "MB_2011_NSW.SA1_MAIN11": "11703133251",
  "MB_2011_NSW.SA1_7DIG11": "1133251",
  "MB_2011_NSW.SA2_MAIN11": "117031332",
  "MB_2011_NSW.SA2_5DIG11": "11332",
  "MB_2011_NSW.SA2_NAME11": "Newtown - Camperdown - 
Darlington",
  "MB_2011_NSW.SA3_CODE11": "11703",
  "MB_2011_NSW.SA3_NAME11": "Sydney Inner City",
  "MB_2011_NSW.SA4_CODE11": "117",
  "MB_2011_NSW.SA4_NAME11": "Sydney - City and Inner South",
  "MB_2011_NSW.GCC_CODE11": "1GSYD",
  "MB_2011_NSW.GCC_NAME11": "Greater Sydney",
  "MB_2011_NSW.STE_CODE11": "1",
  "MB_2011_NSW.STE_NAME11": "New South Wales",
  "MB_2011_NSW.ALBERS_SQM": 7065.8147810602
   },
   "status": "usable"
}
 }

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55b586ca-1507-4389-baae-d61b77f6b030%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with 4 node cluster

2015-02-18 Thread christian . dahlqvist
Hi,

You always want an odd number of master nodes (often 3), so I would 
therefore recommend setting three of the four nodes to be master eligible 
and leave the fourth as a pure data node. This will prevent the cluster 
getting partitioned into two with equal number of master nodes on both 
sides of the partition.

Best regards,

Christian

On Wednesday, February 18, 2015 at 11:39:54 AM UTC, sysads wrote:
>
> Hi
>
> I am in need of help on setting up a 4 node elasticsearch servers. I have 
> installed and configured ES on all 4 nodes but I am lost as to what the 
> configuration in elasticsearch.yml will be:
>
> - if I want to have all 4 nodes both master and data
> - make node A act as primary shard while node B acts as its replica then 
> node C as primary shard while node D as its own replica.
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/956bd6e5-44fa-448e-93aa-9a7623972b42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Node names the same?

2015-02-18 Thread Mark Walkom
I'd say there is still something referring to the old config file.
Try running the process and point it to the actual config file -
-Des.config=$path

On 19 February 2015 at 16:27, ElasticSearch Noobie  wrote:

> Anyone? I'm sorry if this is a silly question, I'm still trying to get my
> head around this.
>
>
> On Wednesday, February 18, 2015 at 7:22:46 PM UTC+11, Marcos Georgopoulos
> wrote:
>>
>> Hi All,
>>
>> I am starting up 3 elasticsearch services to try to utilise the large
>> amount of memory my server has (~390GB).
>>
>> I have just made copy's of the /etc/init.d script and changed them to
>> point to their own configuration file. Then ran update-rcd to ensure they
>> run on boot..
>>
>> I can see the 3 processes running all pointing to there own config files.
>>
>>
>> Each configuration file has the same cluster name
>>
>>
>>
>> ### Cluster
>> ###
>>
>> # Cluster name identifies your cluster for auto-discovery. If you're
>> running
>> # multiple clusters on the same network, make sure you're using unique
>> names.
>> #
>> cluster.name: elasticsearch
>>
>>
>> However I've given them different Node names.  i.e Node1, Node2 and Node3
>>
>>  Node 
>> #
>>
>>
>> # Node names are generated dynamically on startup, so you're relieved
>> # from configuring them manually. You can tie this node to a specific
>> name:
>> #
>> node.name: Node1
>>
>> However when I when I running 'curl localhost:9200/_nodes/process?pretty'
>>  I notice they all have the same Node Name is that correct? Also does
>> everything else look ok?
>>
>>
>> {
>>   "cluster_name" : "elasticsearch",
>>   "nodes" : {
>> "h74xaIssRkui4_jO-kwsfA" : {
>>   "name" : "Node1",
>>   "transport_address" : "inet[/10.166.132.11:9302]",
>>   "host" : "msi-n-logz-1",
>>   "ip" : "10.166.132.11",
>>   "version" : "1.4.2",
>>   "build" : "927caff",
>>   "http_address" : "inet[/10.166.132.11:9202]",
>>   "attributes" : {
>> "master" : "true"
>>   },
>>   "process" : {
>> "refresh_interval_in_millis" : 1000,
>> "id" : 1892,
>> "max_file_descriptors" : 65535,
>> "mlockall" : false
>>   }
>> },
>> "8fhp_JSCRZS27gdXNa8KDg" : {
>>   "name" : "Node1",
>>   "transport_address" : "inet[/10.166.132.11:9301]",
>>   "host" : "msi-n-logz-1",
>>   "ip" : "10.166.132.11",
>>   "version" : "1.4.2",
>>   "build" : "927caff",
>>   "http_address" : "inet[/10.166.132.11:9201]",
>>   "attributes" : {
>> "master" : "true"
>>   },
>>   "process" : {
>> "refresh_interval_in_millis" : 1000,
>> "id" : 1660,
>> "max_file_descriptors" : 65535,
>> "mlockall" : false
>>   }
>> },
>> "_-R2UbIoQNurnZA8_ibLxg" : {
>>   "name" : "Node1",
>>   "transport_address" : "inet[/10.166.132.11:9300]",
>>   "host" : "msi-n-logz-1",
>>   "ip" : "10.166.132.11",
>>   "version" : "1.4.2",
>>   "build" : "927caff",
>>   "http_address" : "inet[/10.166.132.11:9200]",
>>   "attributes" : {
>> "master" : "true"
>>   },
>>   "process" : {
>> "refresh_interval_in_millis" : 1000,
>> "id" : 1636,
>> "max_file_descriptors" : 65535,
>> "mlockall" : false
>>   }
>> },
>> "xPNfdj-rR_SrEbgpxwdWEg" : {
>>   "name" : "logstash-msi-n-logz-1-625-2018",
>>   "transport_address" : "inet[/10.166.132.11:9303]",
>>   "host" : "msi-n-logz-1",
>>   "ip" : "10.166.132.11",
>>   "version" : "1.1.1",
>>   "build" : "f1585f0",
>>   "attributes" : {
>> "client" : "true",
>> "data" : "false"
>>   },
>>   "process" : {
>> "refresh_interval_in_millis" : 1000,
>> "id" : 625,
>> "max_file_descriptors" : 65535,
>> "mlockall" : false
>>   }
>> }
>>   }
>> }
>>
>> Many thanks!
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/478cc5dd-a9b0-4d3e-9bf8-6cb6367954cb%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9MR4_CG0Eyg6%3DNtfHdh97GMqm3YXnNQDPtoFgL2oYH1g

Indexing geo_shapes with low precision

2015-02-18 Thread Russ Cam
I am indexing documents that contain a field of geo_shape type. I am 
currently indexing with a tree_level of 10 as this is the level of accuracy 
that *would be* needed for geospatial queries using that field in the 
future. However, at this point in time, there is no requirement to run 
queries against that field so in effect, I am only interested in persisting 
the geojson within the index to use for presentation purposes when 
retrieving documents.

I am considering using a tree_level of 1 for the field instead of 10. Am I 
correct in thinking:

1. This will reduce the size of the index on disk because fewer geohashes 
will need to be generated and persisted for tree_level 1 compared to 10.

2. The geo_shape json returned for a given document will be the same for 
tree_level 1 as it would be for 10 i.e. the field source data is unaffected 
but the underlying geohash representation will be.

3. The accuracy of geo_shape queries on the field will be greatly reduced 
(as noted above, not a concern for the time being).

Any thoughts greatly appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3153003-9ad9-4d4e-a3cd-04bd299757ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Node names the same?

2015-02-18 Thread ElasticSearch Noobie
Anyone? I'm sorry if this is a silly question, I'm still trying to get my 
head around this.

On Wednesday, February 18, 2015 at 7:22:46 PM UTC+11, Marcos Georgopoulos 
wrote:
>
> Hi All, 
>
> I am starting up 3 elasticsearch services to try to utilise the large 
> amount of memory my server has (~390GB). 
>
> I have just made copy's of the /etc/init.d script and changed them to 
> point to their own configuration file. Then ran update-rcd to ensure they 
> run on boot.. 
>
> I can see the 3 processes running all pointing to there own config files. 
>
>
> Each configuration file has the same cluster name 
>
>
>
> ### Cluster 
> ### 
>
> # Cluster name identifies your cluster for auto-discovery. If you're 
> running 
> # multiple clusters on the same network, make sure you're using unique 
> names. 
> # 
> cluster.name: elasticsearch 
>
>
> However I've given them different Node names.  i.e Node1, Node2 and Node3 
>
>  Node 
> # 
>
> # Node names are generated dynamically on startup, so you're relieved 
> # from configuring them manually. You can tie this node to a specific 
> name: 
> # 
> node.name: Node1 
>
> However when I when I running 'curl localhost:9200/_nodes/process?pretty' 
>  I notice they all have the same Node Name is that correct? Also does 
> everything else look ok? 
>
>
> { 
>   "cluster_name" : "elasticsearch", 
>   "nodes" : { 
> "h74xaIssRkui4_jO-kwsfA" : { 
>   "name" : "Node1", 
>   "transport_address" : "inet[/10.166.132.11:9302]", 
>   "host" : "msi-n-logz-1", 
>   "ip" : "10.166.132.11", 
>   "version" : "1.4.2", 
>   "build" : "927caff", 
>   "http_address" : "inet[/10.166.132.11:9202]", 
>   "attributes" : { 
> "master" : "true" 
>   }, 
>   "process" : { 
> "refresh_interval_in_millis" : 1000, 
> "id" : 1892, 
> "max_file_descriptors" : 65535, 
> "mlockall" : false 
>   } 
> }, 
> "8fhp_JSCRZS27gdXNa8KDg" : { 
>   "name" : "Node1", 
>   "transport_address" : "inet[/10.166.132.11:9301]", 
>   "host" : "msi-n-logz-1", 
>   "ip" : "10.166.132.11", 
>   "version" : "1.4.2", 
>   "build" : "927caff", 
>   "http_address" : "inet[/10.166.132.11:9201]", 
>   "attributes" : { 
> "master" : "true" 
>   }, 
>   "process" : { 
> "refresh_interval_in_millis" : 1000, 
> "id" : 1660, 
> "max_file_descriptors" : 65535, 
> "mlockall" : false 
>   } 
> }, 
> "_-R2UbIoQNurnZA8_ibLxg" : { 
>   "name" : "Node1", 
>   "transport_address" : "inet[/10.166.132.11:9300]", 
>   "host" : "msi-n-logz-1", 
>   "ip" : "10.166.132.11", 
>   "version" : "1.4.2", 
>   "build" : "927caff", 
>   "http_address" : "inet[/10.166.132.11:9200]", 
>   "attributes" : { 
> "master" : "true" 
>   }, 
>   "process" : { 
> "refresh_interval_in_millis" : 1000, 
> "id" : 1636, 
> "max_file_descriptors" : 65535, 
> "mlockall" : false 
>   } 
> }, 
> "xPNfdj-rR_SrEbgpxwdWEg" : { 
>   "name" : "logstash-msi-n-logz-1-625-2018", 
>   "transport_address" : "inet[/10.166.132.11:9303]", 
>   "host" : "msi-n-logz-1", 
>   "ip" : "10.166.132.11", 
>   "version" : "1.1.1", 
>   "build" : "f1585f0", 
>   "attributes" : { 
> "client" : "true", 
> "data" : "false" 
>   }, 
>   "process" : { 
> "refresh_interval_in_millis" : 1000, 
> "id" : 625, 
> "max_file_descriptors" : 65535, 
> "mlockall" : false 
>   } 
> } 
>   } 
> } 
>
> Many thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/478cc5dd-a9b0-4d3e-9bf8-6cb6367954cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: the effective way to update elasticsearch.yml

2015-02-18 Thread Mark Walkom
There are some settings you can update dynamically so I tried to see if I
could update this one dynamically.
It initially looked like my tests worked, hence my comment, but further
testing shows I was going down the wrong path. So apologies about that :)

You will need to update your config and restart the process for it to be
read.

On 19 February 2015 at 12:39, Xiaolin Xie  wrote:

> Hi Mark
>
> I am a little bit confused here. The option(http.compression) makes
> elasticsearch return a "zipped" http response to client when possible (with
> Accept-Encoding). Why does it request the close of indices? what do you
> mean "close" here?
>
> In addition, what is the http endpoint for enable this option?
>
> Thanks a lot for your help
>
> Xiaolin.
>
> On Wednesday, February 18, 2015 at 4:18:31 PM UTC-8, Mark Walkom wrote:
>>
>> Turns out you can enable this dynamically, but you have to close all your
>> indices for it to be accepted.
>>
>> On 19 February 2015 at 10:08, Xiaolin Xie  wrote:
>>
>>> Hi Mark
>>>
>>> Thanks a lot for the quick response. Does elasticsearch itself has any
>>> restful API for me to enable the option "http.compression"?  Thus, one http
>>> request to the ES cluster will do the job for me.
>>>
>>> Xiaolin.
>>>
>>> On Wednesday, February 18, 2015 at 2:10:32 PM UTC-8, Mark Walkom wrote:

 Yes this is an update and restart.

 You should really be using things like puppet/chef/ansible to manage
 config :)

 On 19 February 2015 at 08:51, Xiaolin Xie  wrote:

> Hi ES guys
>
> I am a n00b to elasticsearch. We have a cluster of ES nodes(24 hosts).
> I need to enable the "http.compression" to for this cluster. Do I have to
> manually edit the elasticsearch.yml file in each host and then restart
> elasticserach service in each host to take the configuration change? Is
> there an easier way for me to do that, such as a http endpoint to enable
> the "http.compression"?
>
> Thanks a lot for the help.
>
> Xiaolin.
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/e2f1fdb0-f8a3-436d-8c22-0f1461ab189d%40goo
> glegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/54972514-f883-41d3-b90a-7d6080bce111%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d7dc7739-4c59-4483-82b0-0efdadb2b39c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_rVMNgXRrTgBA5iHQ2%2BWCTxGDjuH9%2BjFG4ytVrNzPRvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Eliminate query terms based on their document frequency!

2015-02-18 Thread sri krishna

  anyone have idea on above stuff ?

On Thursday, 19 February 2015 02:36:09 UTC+5:30, sri krishna wrote:
>
> Hi,
>
> How can we configure elastic search to eliminate query terms specific to 
> field based on document frequency threshold?
>
> for eg:  for query "*title*:test AND *title*:west AND *desc*:world AND 
> *desc*:hello"
> Assume document frequency threshold is set to 10 and few terms in the 
> query i.e,* desc*:world and *title*:test have document frequency greater 
> than 10, i.e query should be changed to "*title*:west AND *desc*:hello".
>
> one approach is to query for each terms in the and based on retrieved 
> document count eliminate such terms exceeding the given document frequency 
> threshold, but this is not effective as it increases number of searches 
> drastically! 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8cdc00a9-6ec1-4f95-84dc-3c4477bafd27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster issue in production, split brain?

2015-02-18 Thread Nara Alzapur
we are on 1.3.2 version. I have looked in other threads for similar issues, 
but they seem to be on very old versions. please let me know how to avoid 
such split brain issues. thank you.

On Wednesday, February 18, 2015 at 7:18:29 PM UTC-6, Nara Alzapur wrote:
>
> thank you for the reply. Yes. They are in same DC. I thought n + 1 /2 
> scenario would apply and such issue would not occur. Wonder how can a node 
> be part of 2 clusters at the same time? any config changes i can do to 
> avoid such issue?
>
>
> On Wednesday, February 18, 2015 at 7:13:39 PM UTC-6, Mark Walkom wrote:
>>
>> This is a split brain scenario.
>>
>> Are your nodes in the same DC?
>>
>> On 19 February 2015 at 11:40, Nara Alzapur  wrote:
>>
>>> I saw an issue with cluster in production today. we have a cluster of 3 
>>> ( master & data ) nodes and min number of master nodes is set to 2 in 
>>> config. 
>>> at some point due to network issue, server 3 couldnt connect to 2, but 
>>> could connect to 1..so it had an active cluster with 1 & 3 and 3 is 
>>> primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is 
>>> this a split brain scenario?
>>>
>>> i assume not because same cluster name and 1 is part of both. bigdesk on 
>>> 2 & 3 showed green. however we saw disc IO shot up to 100%, and 
>>> inconsistency on displaying data depending on which node the data was read 
>>> from. After i restarted service on node 3, it was all fine, all 3 nodes 
>>> were in cluster on bigdesk on all machines.
>>>
>>> can anyone please help me how cluster can get into such state, and how 
>>> to avoid such scenario?
>>>
>>> thank you.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e9d75ac2-451f-496c-adef-b630d1f9dc8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Adding shards to reduce average size

2015-02-18 Thread Jonathan Foy
Hello

I have been struggling off-and-on for months now to run an apparently 
highly updated ES cluster on as little hardware as possible.  There's more 
history here 
,
 
but the overall issue is that my use case involves lots of frequent 
updates, which results in a growing percentage of deleted docs, and thus 
more RAM needed to handle the same dataset.  I want to run a possible 
change in index structure by the group to get any possible recommendations.

Ultimately the growing delete percentage seems to be related to both the 
size of the shards in question, and a hard-coded ratio in the Lucene Tiered 
Merge Policy code.  As it turns out, the merge code will not consider 
merging a segment until the non-deleted portion of the segment is smaller 
than half of the max_merged_segment parameter (someone please correct me if 
I'm wrong here, and I wish it had been easier to discover this).  This does 
not appear to be configurable in any version of Lucene in which I've 
looked.  This explains why bumping max_merged_segment stupidly high can 
keep the deleted percentage low; the larger segments are considered for 
merging when they have a smaller percentage of deleted docs.  This also 
explains why my shards grew to be ~45% deleted docs.

To work around this, I'm considering reindexing with 3 or 4 times the 
number of shards on the same number of nodes to shrink the shards to a more 
manageable size, but want to make sure that the overhead from additional 
shards will not counter the benefits from having smaller shards.  Note that 
I AM using routing, which I assume should reduce much of the over-sharding 
overhead.

The index in question now has six shards with one replica each.  Shard 
sizes ranges from 20-30 GB.  Currently this is running on six r3.xlarge 
(30.5 GB RAM, 20GB allocated to ES) instances.  Ideally I'd be running on 
no more than four.  With a low enough delete percentage I think I can even 
run on three.

My theory is that if I reduce the size of the shards to 5-10 GB max, then I 
can set the max_merged_segment high enough that the deleted docs will be 
merged out quickly, and I'll be able to handle the size of the merges in 
question without issue.  With the existing setup, even on six nodes, I 
occasionally run into merge memory issues with large max_merged_segment 
sizes (13 GB currently I think).

Does this seem reasonable?  I'd be looking at 36-48 shards, counting 
replicas, being spread across 3-4 nodes, so 12-18 each.  Thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39668eef-db62-4ffc-968d-88e04f1712bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: One alias with multiple indices

2015-02-18 Thread tao hiko
Hi Colin,

Thank you for your advice that is a good solution.

Thank you,
Hiko

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d46c709c-6940-4db9-9948-d2fb93dc50d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Decreasing Heap Size Results in Better TPS, How can this happen??

2015-02-18 Thread Srikanth Valiveti
We thought the same about the OS cache, but it was proved to be wrong as
the graphs for the os cache usage seems to be almost same in both the
cases.

On Thu, Feb 19, 2015 at 3:06 AM, Michael McCandless 
wrote:

> Smaller JVM heap means more free RAM for the OS to cache hot pages from
> your index ... in general you should only give the JVM as much as it needs
> (will ever need) and a bit more for safety, and give the rest to the OS so
> it can put hot parts of your index in RAM.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Feb 18, 2015 at 3:50 PM, sri krishna 
> wrote:
>
>> They are couple of questions pertaining to above graphs
>>
>> 1) Yes the GC time taken is doubled, but the (frequency)rate of GC cycle
>> is high(almost double/triple) in 16gb compared to 30gb.
>>
>> 2) Why complete memory is not utilized before cleaning up the memory, for
>> eg for 30gb, maximum usage is around 20gb, for 16gb it is 11gb.
>>
>> 3) This looks like thrashing!, is it because of large index size(76.6GB)
>> on single host ?,
>>
>> PS: GC used here is G1GC
>>
>>
>> On Thursday, 19 February 2015 00:54:25 UTC+5:30, Jörg Prante wrote:
>>>
>>> So you believe in "the more heap the better"? This assumption you have
>>> just proved wrong. Note, in your test, at 16GB heap, GC took half a second,
>>> while at 30GB heap, GC took around a second (and maybe more overhead in
>>> stop-the-world pauses). This is a hint about current JVM GC scalability of
>>> Java 7.
>>>
>>> Jörg
>>>
>>> On Wed, Feb 18, 2015 at 7:45 PM, Srikanth Valiveti <
>>> vsrikant...@gmail.com> wrote:
>>>
 We have 5 shards(give each size) totaling to 76.6GB in a single host
 c3.8xlarge  system (60gb
 ram, 32 core, 2*320 SSD )

 We have multiple fields in our record, but single field is ngram
 analyzed on which we search results for.  This search need to performed on
 all 5 shards of the host to get results, as there is no routing in our 
 case.

 We observed huge variations in search TPS with *decrease* in elastic
 search heap memory size. Attached bigdesk images for both of the below
 cases!

 CASE1)
 When ess_heap size = 16gb
 Search tps observed is  50

 CASE2)
 When ess_heap_size=30gb
 Search tps observed is  18

 Surprising thing is as we decrease ess_heap_size the search tps got
 increased. All the resources(cpu,memory etc) are not fully utilized,  OS
 heap memory is not changed much, observed lot of zig zag ess_heap
 usage(increase and decrease of ess heap usage, may be because of high index
 size that need to be brought to RAM)  and reads I/Os  followed same zig zag
 manner in both the cases.

 Please note that we have run this experiment *multiple* times and
 observed the same pattern. Can you please guide on what is going wrong?
 what dec ess_heap increasing the tps, should we further decrease to achieve
 better tps or we doing something wrong?

 -Thanks
 Srikanth V.

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/f1d1bf51-ab4d-4242-94e5-4b5c3adf466c%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/8aea6643-0fe8-43e0-a698-8379e306d21e%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/sCzgycU8QgE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAD7smReTN-1oScpc%2BU4OLGGd_hAQPLjR0YED%2BLbwp%2Bnbxp80Gw%40mail.gmail.com
> 

Re: Efficient sorting by distance to a point

2015-02-18 Thread Nicholas Gledhill
Possibly should have specified - elasticsearch 1.4.2

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a550bdd2-64f6-4c1f-95b0-55c61923beba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Efficient sorting by distance to a point

2015-02-18 Thread Nicholas Gledhill
Situation, we have an index of "Features" that contain Polygon and 
MultiPolygon coordinate descriptions of geographical spaces.om the Earth's 
surface.

Some of these "Features" contain a "point" value that represents a fairly 
arbitrary point inside the given Polygon... but most don't. Most are just 
described by their border coordinates.

I've designed a query to return a list of Features that are near a certain 
point (or rather within a certain distance of that point).

That's all fairly straight forward.

What we also want to do is return those points listed in ORDER of distance 
from the point given.

"Distance from the point" would preferably be "distance to the closest 
point on the border described in the Polygon (or Multi Polygon)".

But it could also (more efficiently) be the "distance to the nearest point 
in the Polygon (or MultiPolygon) coordinate list".

I have written a Groovy script to calculate a score based on the second of 
these 2 options. It works but it is not performant. Neither of them are. I 
started with one ("geo-distance-score") which ended up being far simpler 
than I wanted it to - and then further simplified it 
("estimate-distance-score") to show that no matter how simple I made the 
calculation, it was the act of scoring them at all that was causing the 
problem; not the computational complexity of the algorithm.

Does anyone have any ideas for how to make this more efficient? Any and all 
ideas - even the slightly crazy ones - are welcome and will be considered. 
:-)

I'll paste the Elasticsearch query below, and attach 2 slightly different 
versions of the Groovy script I was playing around with. Details in the 
comments in those files.

Thanks!


ES Query:
{
   "query": {
  "function_score": {
 "boost_mode": "replace",
 "query": {
"match_all": {}
 },
 "filter": {
"bool": {
   "must": [
  {
 "term": {
"status": "usable"
 }
  },
  {
 "geo_shape": {
"geometry": {
   "relation": "intersects",
   "shape": {
  "type": "circle",
  "coordinates": [
 151.186209,
 -33.892861
  ],
  "radius": "100km"
   }
}
 }
  }
   ]
}
 },
 "script_score": {
"script": "geo-distance-score",
"params": {
   "lat": -33.892861,
   "lon": 151.186209
}
 }
  }
   },
   "size": 10
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5bc53bbe-dc00-40bf-97cb-33b54fa81802%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


estimate-distance-score.groovy
Description: Binary data


geo-distance-score.groovy
Description: Binary data


Re: Elasticsearch performance tuning

2015-02-18 Thread Mark Walkom
1. It depends
2. It depends
3. It depends
4. It also depends.

The performance of ES is dependent on you; your data, your use, your
queries, your hardware, your configuration. If that is the results you got
then it is indicative to your setup and thus is your benchmark, and from
there you can tweak and try to improve performance.

Monitoring LS is a little harder as there are no APIs for it (yet). Most of
the performance of it will result on your filters (especially grok).

On 18 February 2015 at 20:48, Deva Raj  wrote:

> Hi All,
>
>  In a Single Node Elastic Search along with logstash, We tested with 20mb
> and 200mb file parsing to Elastic Search on Different types of the AWS
> instance i.e Medium, Large and Xlarge.
>
> Environment Details : Medium instance 3.75 RAM  1 cores Storage :4 GB SSD
> 64-bit Network Performance: Moderate
> Instance running with : Logstash, Elastic search
>
> Scenario: 1
>
> **With default settings**
> Result :
> 20mb logfile 23 mins Events Per/second 175
> 200mb logfile 3 hrs 3 mins Events Per/second 175
>
>
> Added the following to settings:
> Java heap size : 2GB
> bootstrap.mlockall: true
> indices.fielddata.cache.size: "30%"
> indices.cache.filter.size: "30%"
> index.translog.flush_threshold_ops: 5
> indices.memory.index_buffer_size: 50%
>
> # Search thread pool
> threadpool.search.type: fixed
> threadpool.search.size: 20
> threadpool.search.queue_size: 100
>
> **With added settings**
> Result:
> 20mb logfile 22 mins Events Per/second 180
> 200mb logfile 3 hrs 07 mins Events Per/second 180
>
> Scenario 2
>
> Environment Details : R3 Large 15.25 RAM  2 cores Storage :32 GB SSD
> 64-bit Network Performance: Moderate
> Instance running with : Logstash, Elastic search
>
> **With default settings**
> Result :
>   20mb logfile 7 mins Events Per/second 750
>   200mb logfile 65 mins Events Per/second 800
>
> Added the following to settings:
> Java heap size: 7gb
> other parameters same as above
>
> **With added settings**
> Result:
> 20mb logfile 7 mins Events Per/second 800
> 200mb logfile 55 mins Events Per/second 800
>
> Scenario 3
>
> Environment Details :
> R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD
> 64-bit Network Performance: Moderate
> Instance running with : Logstash, Elastic search
>
> **With default settings**
>   Result:
>   20mb logfile 7 mins Events Per/second 1200
>   200mb logfile 34 mins Events Per/second 1200
>
>  Added the following to settings:
> Java heap size: 15gb
> other parameters same as above
>
> **With added settings**
> Result:
> 20mb logfile 7 mins Events Per/second 1200
> 200mb logfile 34 mins Events Per/second 1200
>
> I wanted to know
>
> 1. What is the benchmark for the performance?
> 2. Is the performance meets the benchmark or is it below the benchmark
> 3. Why even after i increased the elasticsearch JVM iam not able to find
> the difference?
> 4. how do i monitor Logstash and improve its performance?
>
> appreciate any help on this as iam new to logstash and elastic search.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_XRyHNnAt81NbdP0rK0r82%3D9LCNJsQTayEQiQNE8AA5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with 4 node cluster

2015-02-18 Thread Nara Alzapur
you can make all nodes as master + data by setting following in 
elasticsearch.yml file. i dont think/know if you can setup 2 nodes as 
primary in cluster. For each index, you can set number of replica as 1 or 
more, so ES will automatically move them to different nodes than primary 
shard. i recommend using unicast for discovery as shown below if you have 
static IPs.

node.master: true
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["IP1", "IP2"]


On Wednesday, February 18, 2015 at 5:39:54 AM UTC-6, sysads wrote:
>
> Hi
>
> I am in need of help on setting up a 4 node elasticsearch servers. I have 
> installed and configured ES on all 4 nodes but I am lost as to what the 
> configuration in elasticsearch.yml will be:
>
> - if I want to have all 4 nodes both master and data
> - make node A act as primary shard while node B acts as its replica then 
> node C as primary shard while node D as its own replica.
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e73c6d3e-e0dd-4403-87d9-3ab922bb974f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elastic search cluster behind azure public load balncer

2015-02-18 Thread Mark Walkom
Yes the master can serve requests.

You don't really want 2 masters and 1 data node though, make all 3
master+data to start with.
And sure the client can be a SPOF, but then isn't a single load balancer a
SPOF as well? So the question remains, where are you happy dealing with
these points, because at some point you cannot make *everything* redundant
without being excessive.

On 18 February 2015 at 21:36, Subodh Patil  wrote:

>
> i am trying to setup ES cluster behind azure load balancer /cloud service.
> the cluster is 3 node with no specific data/client/master node settings. By
> default 2 nodes are elected as master and 1 as data node.
>
> As the request (create/update/search) from application comes to azure load
> balancer on 9200 port which load balanced for all 3 vms the request can go
> to any vm.
>
> Will master node be able to serve the requests ?
>
> Many article says that you don't need load balancer for ES cluster just
> use client node but then it becomes single point of failure as azure vm can
> go down any point of time. so load balancing is required mainly for high
> availability from infrastructure point of view.
> please suggest cluster setup and which nodes (data or client) to be put
> behind load balancer.
>
>
> ES version 1.4.1 on windows server 2012 r2 vm
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8xGPe2aMORiNHYvrtj54nuPou%2B5wT-4WSOp-air%3DUQZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: the effective way to update elasticsearch.yml

2015-02-18 Thread Xiaolin Xie
Hi Mark

I am a little bit confused here. The option(http.compression) makes 
elasticsearch return a "zipped" http response to client when possible (with 
Accept-Encoding). Why does it request the close of indices? what do you 
mean "close" here? 

In addition, what is the http endpoint for enable this option? 

Thanks a lot for your help

Xiaolin.

On Wednesday, February 18, 2015 at 4:18:31 PM UTC-8, Mark Walkom wrote:
>
> Turns out you can enable this dynamically, but you have to close all your 
> indices for it to be accepted.
>
> On 19 February 2015 at 10:08, Xiaolin Xie 
> > wrote:
>
>> Hi Mark
>>
>> Thanks a lot for the quick response. Does elasticsearch itself has any 
>> restful API for me to enable the option "http.compression"?  Thus, one http 
>> request to the ES cluster will do the job for me.
>>
>> Xiaolin.
>>
>> On Wednesday, February 18, 2015 at 2:10:32 PM UTC-8, Mark Walkom wrote:
>>>
>>> Yes this is an update and restart.
>>>
>>> You should really be using things like puppet/chef/ansible to manage 
>>> config :)
>>>
>>> On 19 February 2015 at 08:51, Xiaolin Xie  wrote:
>>>
 Hi ES guys

 I am a n00b to elasticsearch. We have a cluster of ES nodes(24 hosts). 
 I need to enable the "http.compression" to for this cluster. Do I have to 
 manually edit the elasticsearch.yml file in each host and then restart 
 elasticserach service in each host to take the configuration change? Is 
 there an easier way for me to do that, such as a http endpoint to enable 
 the "http.compression"?

 Thanks a lot for the help.

 Xiaolin.

 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/e2f1fdb0-f8a3-436d-8c22-0f1461ab189d%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/54972514-f883-41d3-b90a-7d6080bce111%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d7dc7739-4c59-4483-82b0-0efdadb2b39c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with 4 node cluster

2015-02-18 Thread Mark Walkom
Take a look at these pages -
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/deploy.html

Also there is not really any point in forcing shard types on specific
machines, unless you want to leverage some kind of awareness. A primary and
replica shard are the same thing other than a metadata flag that
differentiates them.

On 18 February 2015 at 22:39, sysads 
wrote:

> Hi
>
> I am in need of help on setting up a 4 node elasticsearch servers. I have
> installed and configured ES on all 4 nodes but I am lost as to what the
> configuration in elasticsearch.yml will be:
>
> - if I want to have all 4 nodes both master and data
> - make node A act as primary shard while node B acts as its replica then
> node C as primary shard while node D as its own replica.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ff3ad49c-f305-44cb-a473-523ad6758835%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8s8Xn%3DZncku8JWBCGFPSM1hoNtbCSFUXrCb0EaR-d%3DJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Issue with custom json input data using logstash

2015-02-18 Thread jlam
I'm using logstash.

On the client, I setup the input for logstash with json codec and output to 
redis server.  There is a logstash instance to pop from redis list into ES.

Thanks,
Jared

On Wednesday, February 18, 2015 at 5:14:15 PM UTC-8, Mark Walkom wrote:
>
> How are you feeding the json logs into ES?
>
> On 19 February 2015 at 10:56, > wrote:
>
>> Also the CPU usage also jump considerably too.
>>
>> Thanks,
>> Jared
>>
>>
>> On Wednesday, February 18, 2015 at 3:50:11 PM UTC-8, jl...@bills.com 
>> wrote:
>>>
>>> Hello Everyone,
>>>
>>> I'm hoping I might get some help on how Elasticsearch.  I'm getting 
>>> performance issues with Elasticsearch.
>>>
>>> With our current setup:
>>> We have Elasticsearch (1.4.3), redis, and logstash installed on the same 
>>> server with 30GB of memory.  The ES_HEAP_SIZE is set to 15GB.  Each server 
>>> has logstash installed and push the logs to redis.  The logstash on the 
>>> server will pickup logs from redis and push them to Elasticsearch.
>>>
>>> We are logging apache logs on all the web servers without any 
>>> performance issues.  Kibana works fine and performance is pretty fast.
>>>
>>> Here is the issue:
>>> We want to do custom application logging.  The logs are in json format.  
>>> When Elasticsearch getting 
>>> "org.elasticsearch.index.mapper.MapperParsingException: 
>>> failed to parse" exceptions, the performance really degrades and become 
>>> unusable.  The redis will consume more and more memory.  Elasticsearch will 
>>> come to a point where it is doing GC.  Restarting Elasticsearch doesn't 
>>> help.
>>>
>>> The dataset is not that big comparing to others.  The daily size of the 
>>> dataset is probably 2GB to 3GB of logs.
>>>
>>> It seems that if Elasticsearch is having problem execute bulk item 
>>> index, it degrades the performance considerably.
>>>
>>> I'm wondering if there are any recommendation on Elasticsearch and 
>>> logstash configuration.
>>>
>>> Do I need to alter the logstash mapping?
>>>
>>> Thanks,
>>> Jared
>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/eaf86e7f-874c-4eef-92ae-747c305576c2%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/608d2cb9-2e54-4d47-bf92-d50ca85ca2e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster issue in production, split brain?

2015-02-18 Thread Nara Alzapur
thank you for the reply. Yes. They are in same DC. I thought n + 1 /2 
scenario would apply and such issue would not occur. Wonder how can a node 
be part of 2 clusters at the same time? any config changes i can do to 
avoid such issue?


On Wednesday, February 18, 2015 at 7:13:39 PM UTC-6, Mark Walkom wrote:
>
> This is a split brain scenario.
>
> Are your nodes in the same DC?
>
> On 19 February 2015 at 11:40, Nara Alzapur 
> > wrote:
>
>> I saw an issue with cluster in production today. we have a cluster of 3 ( 
>> master & data ) nodes and min number of master nodes is set to 2 in config. 
>> at some point due to network issue, server 3 couldnt connect to 2, but 
>> could connect to 1..so it had an active cluster with 1 & 3 and 3 is 
>> primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is 
>> this a split brain scenario?
>>
>> i assume not because same cluster name and 1 is part of both. bigdesk on 
>> 2 & 3 showed green. however we saw disc IO shot up to 100%, and 
>> inconsistency on displaying data depending on which node the data was read 
>> from. After i restarted service on node 3, it was all fine, all 3 nodes 
>> were in cluster on bigdesk on all machines.
>>
>> can anyone please help me how cluster can get into such state, and how to 
>> avoid such scenario?
>>
>> thank you.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5e81fa7d-ac82-4fce-9d0f-971abb7ab71a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Issue with custom json input data using logstash

2015-02-18 Thread Mark Walkom
How are you feeding the json logs into ES?

On 19 February 2015 at 10:56,  wrote:

> Also the CPU usage also jump considerably too.
>
> Thanks,
> Jared
>
>
> On Wednesday, February 18, 2015 at 3:50:11 PM UTC-8, jl...@bills.com
> wrote:
>>
>> Hello Everyone,
>>
>> I'm hoping I might get some help on how Elasticsearch.  I'm getting
>> performance issues with Elasticsearch.
>>
>> With our current setup:
>> We have Elasticsearch (1.4.3), redis, and logstash installed on the same
>> server with 30GB of memory.  The ES_HEAP_SIZE is set to 15GB.  Each server
>> has logstash installed and push the logs to redis.  The logstash on the
>> server will pickup logs from redis and push them to Elasticsearch.
>>
>> We are logging apache logs on all the web servers without any performance
>> issues.  Kibana works fine and performance is pretty fast.
>>
>> Here is the issue:
>> We want to do custom application logging.  The logs are in json format.
>> When Elasticsearch getting 
>> "org.elasticsearch.index.mapper.MapperParsingException:
>> failed to parse" exceptions, the performance really degrades and become
>> unusable.  The redis will consume more and more memory.  Elasticsearch will
>> come to a point where it is doing GC.  Restarting Elasticsearch doesn't
>> help.
>>
>> The dataset is not that big comparing to others.  The daily size of the
>> dataset is probably 2GB to 3GB of logs.
>>
>> It seems that if Elasticsearch is having problem execute bulk item index,
>> it degrades the performance considerably.
>>
>> I'm wondering if there are any recommendation on Elasticsearch and
>> logstash configuration.
>>
>> Do I need to alter the logstash mapping?
>>
>> Thanks,
>> Jared
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/eaf86e7f-874c-4eef-92ae-747c305576c2%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_nJqMnZSGFnw3h1LyEnxhX-17mwL7PoV1RHunhiymv5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster issue in production, split brain?

2015-02-18 Thread Mark Walkom
This is a split brain scenario.

Are your nodes in the same DC?

On 19 February 2015 at 11:40, Nara Alzapur  wrote:

> I saw an issue with cluster in production today. we have a cluster of 3 (
> master & data ) nodes and min number of master nodes is set to 2 in config.
> at some point due to network issue, server 3 couldnt connect to 2, but
> could connect to 1..so it had an active cluster with 1 & 3 and 3 is
> primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
> this a split brain scenario?
>
> i assume not because same cluster name and 1 is part of both. bigdesk on 2
> & 3 showed green. however we saw disc IO shot up to 100%, and inconsistency
> on displaying data depending on which node the data was read from. After i
> restarted service on node 3, it was all fine, all 3 nodes were in cluster
> on bigdesk on all machines.
>
> can anyone please help me how cluster can get into such state, and how to
> avoid such scenario?
>
> thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_JKWenby8HEhnkFyBfM334w4jOzgULW_izT8BfQbkf3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Cluster issue in production, split brain?

2015-02-18 Thread Nara Alzapur
I saw an issue with cluster in production today. we have a cluster of 3 ( 
master & data ) nodes and min number of master nodes is set to 2 in config. 
at some point due to network issue, server 3 couldnt connect to 2, but 
could connect to 1..so it had an active cluster with 1 & 3 and 3 is 
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is 
this a split brain scenario?

i assume not because same cluster name and 1 is part of both. bigdesk on 2 
& 3 showed green. however we saw disc IO shot up to 100%, and inconsistency 
on displaying data depending on which node the data was read from. After i 
restarted service on node 3, it was all fine, all 3 nodes were in cluster 
on bigdesk on all machines.

can anyone please help me how cluster can get into such state, and how to 
avoid such scenario?

thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: the effective way to update elasticsearch.yml

2015-02-18 Thread Mark Walkom
Turns out you can enable this dynamically, but you have to close all your
indices for it to be accepted.

On 19 February 2015 at 10:08, Xiaolin Xie  wrote:

> Hi Mark
>
> Thanks a lot for the quick response. Does elasticsearch itself has any
> restful API for me to enable the option "http.compression"?  Thus, one http
> request to the ES cluster will do the job for me.
>
> Xiaolin.
>
> On Wednesday, February 18, 2015 at 2:10:32 PM UTC-8, Mark Walkom wrote:
>>
>> Yes this is an update and restart.
>>
>> You should really be using things like puppet/chef/ansible to manage
>> config :)
>>
>> On 19 February 2015 at 08:51, Xiaolin Xie  wrote:
>>
>>> Hi ES guys
>>>
>>> I am a n00b to elasticsearch. We have a cluster of ES nodes(24 hosts). I
>>> need to enable the "http.compression" to for this cluster. Do I have to
>>> manually edit the elasticsearch.yml file in each host and then restart
>>> elasticserach service in each host to take the configuration change? Is
>>> there an easier way for me to do that, such as a http endpoint to enable
>>> the "http.compression"?
>>>
>>> Thanks a lot for the help.
>>>
>>> Xiaolin.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/e2f1fdb0-f8a3-436d-8c22-0f1461ab189d%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/54972514-f883-41d3-b90a-7d6080bce111%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8_Ugh86Z2fd7yo5bVkKoveVadgoQHeyw4XeV-XAzqxtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ClassNotFoundException: org.elasticsearch.discovery.ec2.Ec2DiscoveryModule

2015-02-18 Thread Diana Tuck
New to ES - Trying to use the elasticsearch-cloud-aws plugin, but when 
starting my java client node, I'm getting ClassNotFoundException 
on org.elasticsearch.discovery.ec2.Ec2DiscoveryModule.   Do I need to 
install this plugin on java client nodes, and if so, how does one do that? 
 Or, rather, is there a maven dependency that can be referenced to load 
these required classes?

For reference, the elasticsearch.yaml is:

plugin.mandatory: cloud-aws
cloud: 
  aws: 
access_key: ** 
secret_key: * 
discovery: 
  type: ec2

and my java client code is:

Settings settings = ImmutableSettings.settingsBuilder()
.put("node.name", nodeName)
.put("cloud.aws.access_key", awsAccessKey)
.put("cloud.aws.secret_key", awsSecretKey)
.put("cloud.node.auto_attributes", true)
.put("discovery.type", "ec2")
.build();
this.node = nodeBuilder()
.clusterName(clusterName)
.settings(settings)
.client(true)
.node();
this.client = node.client();

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/925353fd-b717-417d-986f-570c634e39c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Issue with custom json input data using logstash

2015-02-18 Thread jlam
Also the CPU usage also jump considerably too.

Thanks,
Jared

On Wednesday, February 18, 2015 at 3:50:11 PM UTC-8, jl...@bills.com wrote:
>
> Hello Everyone,
>
> I'm hoping I might get some help on how Elasticsearch.  I'm getting 
> performance issues with Elasticsearch.
>
> With our current setup:
> We have Elasticsearch (1.4.3), redis, and logstash installed on the same 
> server with 30GB of memory.  The ES_HEAP_SIZE is set to 15GB.  Each server 
> has logstash installed and push the logs to redis.  The logstash on the 
> server will pickup logs from redis and push them to Elasticsearch.
>
> We are logging apache logs on all the web servers without any performance 
> issues.  Kibana works fine and performance is pretty fast.
>
> Here is the issue:
> We want to do custom application logging.  The logs are in json format. 
>  When Elasticsearch getting 
> "org.elasticsearch.index.mapper.MapperParsingException: failed to parse" 
> exceptions, the performance really degrades and become unusable.  The redis 
> will consume more and more memory.  Elasticsearch will come to a point 
> where it is doing GC.  Restarting Elasticsearch doesn't help.
>
> The dataset is not that big comparing to others.  The daily size of the 
> dataset is probably 2GB to 3GB of logs.
>
> It seems that if Elasticsearch is having problem execute bulk item index, 
> it degrades the performance considerably.
>
> I'm wondering if there are any recommendation on Elasticsearch and 
> logstash configuration.
>
> Do I need to alter the logstash mapping?
>
> Thanks,
> Jared
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eaf86e7f-874c-4eef-92ae-747c305576c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch Issue with custom json input data using logstash

2015-02-18 Thread jlam
Hello Everyone,

I'm hoping I might get some help on how Elasticsearch.  I'm getting 
performance issues with Elasticsearch.

With our current setup:
We have Elasticsearch (1.4.3), redis, and logstash installed on the same 
server with 30GB of memory.  The ES_HEAP_SIZE is set to 15GB.  Each server 
has logstash installed and push the logs to redis.  The logstash on the 
server will pickup logs from redis and push them to Elasticsearch.

We are logging apache logs on all the web servers without any performance 
issues.  Kibana works fine and performance is pretty fast.

Here is the issue:
We want to do custom application logging.  The logs are in json format. 
 When Elasticsearch getting 
"org.elasticsearch.index.mapper.MapperParsingException: failed to parse" 
exceptions, the performance really degrades and become unusable.  The redis 
will consume more and more memory.  Elasticsearch will come to a point 
where it is doing GC.  Restarting Elasticsearch doesn't help.

The dataset is not that big comparing to others.  The daily size of the 
dataset is probably 2GB to 3GB of logs.

It seems that if Elasticsearch is having problem execute bulk item index, 
it degrades the performance considerably.

I'm wondering if there are any recommendation on Elasticsearch and logstash 
configuration.

Do I need to alter the logstash mapping?

Thanks,
Jared


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f9c03a63-41bc-4e73-99ec-3bf6b54cbbb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: connection problem to elasticsearch

2015-02-18 Thread Mark Walkom
I
​t'd help if you could provide your config, but it's best not to paste
large amounts of logs or config to the list, just use gist/pastebin/etc and
then give us a link.

However it looks like you have *discovery.zen.minimum_master_**nodes: 2*,
so unless you have two nodes then your single node will never be available.
You should set that to 1 if you are just playing with ES locally.​

On 19 February 2015 at 04:57, Ali Lotfdar  wrote:

> Hello All,
>
> I am completely confused in managing and connection elasticsearch. Thanks
> a lot to help me solve my problem.
>
> -I started to create some Index inside ElasticSearch but even when I
> delete them they return! I found that it is because they are still
> available in other nodes! Thanks to let me know how i can delete the
> completely?
> -Also I have to open two nodes to be able to work with ElasticSearch
> otherwise I encounter with 'no known master node '!
> -When I had a problem and I close a node, nothing will work till I restart
> my computer!
> -Also all my index status are red!
>
> I just explain the status to inform you about the condition. Thank you to
> help me to get rid of this dilemma!
>
>
> Just please consider that I am new user of Elasticsearch and apologize for
> such question!
>
>
> [2015-02-18 12:44:59,846][INFO ][node ] [Helleyes]
> version[1.4.2], pid[483], build[927caff/2014-12-16T14:11:12Z]
>
> [2015-02-18 12:44:59,847][INFO ][node ] [Helleyes]
> initializing ...
>
> [2015-02-18 12:44:59,861][INFO ][plugins  ] [Helleyes]
> loaded [marvel], sites [marvel]
>
> [2015-02-18 12:45:02,745][INFO ][node ] [Helleyes]
> initialized
>
> [2015-02-18 12:45:02,746][INFO ][node ] [Helleyes]
> starting ...
>
> [2015-02-18 12:45:02,811][INFO ][transport] [Helleyes]
> bound_address {inet[/0:0:0:0:0:0:0:0:9305]}, publish_address {inet[/
> 192.168.110.133:9305]}
>
> [2015-02-18 12:45:02,827][INFO ][discovery] [Helleyes]
> elasticsearch/boK22-mkRPGMo-1TSnXhZw
>
> [2015-02-18 12:45:06,599][INFO ][cluster.service  ] [Helleyes]
> new_master
> [Helleyes][boK22-mkRPGMo-1TSnXhZw][webdev99s-Mac-mini.local][inet[/192.168.110.133:9305]],
> reason: zen-disco-join (elected_as_master)
>
> [2015-02-18 12:45:06,708][INFO ][http ] [Helleyes]
> bound_address {inet[/0:0:0:0:0:0:0:0:9205]}, publish_address {inet[/
> 192.168.110.133:9205]}
>
> [2015-02-18 12:45:06,709][INFO ][node ] [Helleyes]
> started
>
> [2015-02-18 12:45:07,594][INFO ][discovery.zen] [Helleyes]
> updating discovery.zen.minimum_master_nodes from [-1] to [2]
>
> [2015-02-18 12:45:07,710][INFO ][gateway  ] [Helleyes]
> recovered [23] indices into cluster_state
>
> [2015-02-18 12:45:07,711][WARN ][discovery.zen] [Helleyes] not
> enough master nodes on change of minimum_master_nodes from [-1] to [2],
> current nodes:
> {[Helleyes][boK22-mkRPGMo-1TSnXhZw][webdev99s-Mac-mini.local][inet[/192.168.110.133:9305
> ]],}
>
> [2015-02-18 12:45:07,719][ERROR][cluster.action.shard ] [Helleyes]
> unexpected failure during [shard-started ([logstash-2015.02.05][0],
> node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery
> from gateway]]
>
> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: no
> longer master. source: [shard-started ([logstash-2015.02.05][0],
> node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery
> from gateway]]
>
> at
> org.elasticsearch.cluster.ClusterStateUpdateTask.onNoLongerMaster(ClusterStateUpdateTask.java:53)
>
> at
> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:324)
>
> at
> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> [2015-02-18 12:45:07,722][ERROR][cluster.action.shard ] [Helleyes]
> unexpected failure during [shard-started ([test3][2],
> node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery
> from gateway]]
>
> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: no
> longer master. source: [shard-started ([test3][2],
> node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery
> from gateway]]
>
> at
> org.elasticsearch.cluster.ClusterStateUpdateTask.onNoLongerMaster(ClusterStateUpdateTask.java:53)
>
> at
> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:324)
>
> at
> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
>
> at

Re: the effective way to update elasticsearch.yml

2015-02-18 Thread Xiaolin Xie
Hi Mark

Thanks a lot for the quick response. Does elasticsearch itself has any 
restful API for me to enable the option "http.compression"?  Thus, one http 
request to the ES cluster will do the job for me.

Xiaolin.

On Wednesday, February 18, 2015 at 2:10:32 PM UTC-8, Mark Walkom wrote:
>
> Yes this is an update and restart.
>
> You should really be using things like puppet/chef/ansible to manage 
> config :)
>
> On 19 February 2015 at 08:51, Xiaolin Xie 
> > wrote:
>
>> Hi ES guys
>>
>> I am a n00b to elasticsearch. We have a cluster of ES nodes(24 hosts). I 
>> need to enable the "http.compression" to for this cluster. Do I have to 
>> manually edit the elasticsearch.yml file in each host and then restart 
>> elasticserach service in each host to take the configuration change? Is 
>> there an easier way for me to do that, such as a http endpoint to enable 
>> the "http.compression"?
>>
>> Thanks a lot for the help.
>>
>> Xiaolin.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e2f1fdb0-f8a3-436d-8c22-0f1461ab189d%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54972514-f883-41d3-b90a-7d6080bce111%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


multiple words but same token

2015-02-18 Thread NM
Hi,

I would like to index 

"driving with car A is a bad thing"

knowing that I have special words composed of several words e.g. "car A" 
and "car B".

if I use  standard appraoch the a or b are lost. ideally for "driving with 
car A is a bad thing" the analyzer should return this:
( "driving", "car A", "bad", "thing") 

help, how will you do that? 

--- 
my filters night well
filter: {   
protect_words:{
type: "keyword_marker",
  keywords: "car a", "car b"]
},
protect_words2:{
type: "word_delimiter",
protected_words: ["car a", "car b"]
}
  }, 
analyzer: {

custom_level2: {  
 tokenizer: 'standard',
  filter: ["asciifolding", "lowercase", "protect_words"],
  type: 'custom'
},
custom_level1: {   
  tokenizer: 'keyword',   
  filter: ["asciifolding", "lowercase", "protect_words2"],
  type: 'custom'
},
}
} 

any help?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3aa288d7-19b4-4f49-b321-feccfd2b30a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [QUESTIONS] - failover mechanisms and consistency

2015-02-18 Thread Nicolas Harraudeau
Thank you, simple answers are the best!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2450588d-23a5-464b-ba77-459b3fad268c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Delete all documents in an Index - Using Python

2015-02-18 Thread Aaron Mildenstein
If you're deleting it all, why not just delete the whole index and 
re-create it?  It will certainly be faster, and less taxing to the system.

If you need to preserve a mapping, download it first so you can re-upload 
it later.

--Aaron

On Wednesday, February 18, 2015 at 1:15:25 PM UTC-7, Amay Patil wrote:
>
> How can I delete all documents in an Index using Python.
>
> Ap
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20c55daa-4c60-46ad-92a1-3b398c386ee5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [QUESTIONS] - failover mechanisms and consistency

2015-02-18 Thread Mark Walkom
1 - yes, if you are getting by the document ID
2 - yes, see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency
3 - yes

Between step 6 and 7 the other replica will be updated.

On 18 February 2015 at 08:57, Nicolas Harraudeau <
nicolas.harraud...@gmail.com> wrote:

> I have been searching the documentation, stackoverflow and this google
> group for some answers about GET API and consistency but there are still
> some details I am not sure to understand correctly.
>
>
> Here are the things I think I understood, don’t hesitate to correct me if
> I am wrong:
>
>
> 1. a GET immediately following a PUT (with sync enabled) will always
> return the same document thanks to the transaction log. This is true even
> if the GET has the default ”random” preference. (we suppose no other
> process write at the same time)
>
> 2. even with a QUORUM consistency, a write operation in “sync” mode will
> always send the new doc to ALL replica and wait for their answers. QUORUM
> only changes the number of successful replications needed.
>
> 3. if a replica is down and comes back, it will have to synchronise with
> the other nodes before having the right to answer requests.
>
>
> What I don’t understand is the “how” this all works in case of a short
> node failure.
>
>
> Let’s take a simplified example:
>
> - 3 nodes: A, B and C
>
> - 1 shard with node A as primary node, B and C being replica
>
> - 1 single threaded client
>
>
> 1. the client PUT a doc in sync mode and QUORUM consistency.
>
> 2. the request is redirected to node A where it is written.
>
> 3. the doc is replicated to node B.
>
> 4. node C does not respond and fails to replicate (due for example to
> garbage collection)
>
> 5. as quorum is satisfied A returns a success
>
> 6. garbage collector finishes its job on node C. It can be contacted again.
>
> 7. Once the answer from node A is received the client performs a GET of
> the document with default (random) preference
>
>
> Here are the questions:
>
> - what happens between steps 4 and 5? Is node C unallocated immediately,
> before answering to the write request?
>
> - what happens between steps 6 and 7? The problem was very short and node
> C did not stop. Is it possible that node C does not realise it failed some
> requests and continue to answer client requests?
>
> - Do official client library detect that a node has been unallocated
> before sending a request?
>
> - What happens if a client does not check unallocated nodes in step 7 and
> sends the GET request directly to node C?
>
> - What happens if in step 7 the client sends the GET request to node B
> (not the primary one)? Does it know that B has been unallocated? if not,
> can the request be redirected to node C (as the preference is random)?
>
> - What happens if in step 7 the client sends the GET request to node A
> (primary shard)? (just to be sure)
>
>
>
> I have been using Elasticsearch for a few months now and you guys have
> done a really great job. Thank you for your hard work. I have not
> experienced the problems I described here, those are just scary things I
> imagined after reading the doc. Maybe these corner cases have already been
> explained. If so, I apologise.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7f422255-3211-453c-891a-157349b54950%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9ETMkPenqw7UkwZ5WnaFkb25EfMCWqnvuyx%2Bm-%2BWEEhw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elastic search on t2.micro (Amazon WS)

2015-02-18 Thread Mark Walkom
Your only real option here is to get a machine with more RAM.
Try spinning up a VM locally, on your desk/laptop.

On 19 February 2015 at 00:52, Seung Chan Lim  wrote:

> I'm trying to see if I can get elastic search (ES) 1.3.8 working with
> couchbase (CB) 3.0.2 on a t2.micro (Amazon WS)
>
> t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test
> development on this with not a lot of documents (<1000).
>
> I just installed ES and followed the CB instructions to install the plugin
> and set the XDCR to get the replication going from CB to ES.
>
> I also configured ES to have 0 replication and 1 shard (hoping this would
> help minimize RAM usage). But I'm still seeing behaviors from ES where it
> locks up the server, making it unresponsive than eventually complaining of
> lack of memory.
>
> Is there something else I can do to get this working on a t2.micro?
>
> I'm a complete newbie to ES, and any help would be great,
>
> thank you
>
> slim
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8OrqYRwMf8UN%3DG6BAsJpvqAqkz0r33fhhXdK2jqmnv1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: the effective way to update elasticsearch.yml

2015-02-18 Thread Mark Walkom
Yes this is an update and restart.

You should really be using things like puppet/chef/ansible to manage config
:)

On 19 February 2015 at 08:51, Xiaolin Xie  wrote:

> Hi ES guys
>
> I am a n00b to elasticsearch. We have a cluster of ES nodes(24 hosts). I
> need to enable the "http.compression" to for this cluster. Do I have to
> manually edit the elasticsearch.yml file in each host and then restart
> elasticserach service in each host to take the configuration change? Is
> there an easier way for me to do that, such as a http endpoint to enable
> the "http.compression"?
>
> Thanks a lot for the help.
>
> Xiaolin.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e2f1fdb0-f8a3-436d-8c22-0f1461ab189d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9ye4KQ-%2B26QrAt10Z_Df%2BHF8BQd8GLpvTqNUn-ojnmfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


the effective way to update elasticsearch.yml

2015-02-18 Thread Xiaolin Xie
Hi ES guys

I am a n00b to elasticsearch. We have a cluster of ES nodes(24 hosts). I 
need to enable the "http.compression" to for this cluster. Do I have to 
manually edit the elasticsearch.yml file in each host and then restart 
elasticserach service in each host to take the configuration change? Is 
there an easier way for me to do that, such as a http endpoint to enable 
the "http.compression"?

Thanks a lot for the help.

Xiaolin.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2f1fdb0-f8a3-436d-8c22-0f1461ab189d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how test plugin in eclipse in elasticsearch

2015-02-18 Thread Ali Lotfdar
Thank you.

On Tuesday, February 17, 2015 at 10:27:42 AM UTC-5, Itamar Syn-Hershko 
wrote:
>
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/using-elasticsearch-test-classes.html
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>
> On Tue, Feb 17, 2015 at 5:23 PM, Ali Lotfdar  > wrote:
>
>> Hi All,
>>
>> I found this topic in previous topics but I need some help too.
>>
>> I want to know if it is possible to test my plugin before installing 
>> inside ES? and if yes how(test to give some sample data and see the 
>> result!)?
>> Could you please let me know how it is possible and how I can debug it 
>> using main method?
>>
>> Thanks,
>> Ali
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/37768df0-8877-46db-8fc2-556428c9896a%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d8bf9a3-d21c-4526-9834-bf6656f69317%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Decreasing Heap Size Results in Better TPS, How can this happen??

2015-02-18 Thread Michael McCandless
Smaller JVM heap means more free RAM for the OS to cache hot pages from
your index ... in general you should only give the JVM as much as it needs
(will ever need) and a bit more for safety, and give the rest to the OS so
it can put hot parts of your index in RAM.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Feb 18, 2015 at 3:50 PM, sri krishna  wrote:

> They are couple of questions pertaining to above graphs
>
> 1) Yes the GC time taken is doubled, but the (frequency)rate of GC cycle
> is high(almost double/triple) in 16gb compared to 30gb.
>
> 2) Why complete memory is not utilized before cleaning up the memory, for
> eg for 30gb, maximum usage is around 20gb, for 16gb it is 11gb.
>
> 3) This looks like thrashing!, is it because of large index size(76.6GB)
> on single host ?,
>
> PS: GC used here is G1GC
>
>
> On Thursday, 19 February 2015 00:54:25 UTC+5:30, Jörg Prante wrote:
>>
>> So you believe in "the more heap the better"? This assumption you have
>> just proved wrong. Note, in your test, at 16GB heap, GC took half a second,
>> while at 30GB heap, GC took around a second (and maybe more overhead in
>> stop-the-world pauses). This is a hint about current JVM GC scalability of
>> Java 7.
>>
>> Jörg
>>
>> On Wed, Feb 18, 2015 at 7:45 PM, Srikanth Valiveti > > wrote:
>>
>>> We have 5 shards(give each size) totaling to 76.6GB in a single host
>>> c3.8xlarge  system (60gb
>>> ram, 32 core, 2*320 SSD )
>>>
>>> We have multiple fields in our record, but single field is ngram
>>> analyzed on which we search results for.  This search need to performed on
>>> all 5 shards of the host to get results, as there is no routing in our case.
>>>
>>> We observed huge variations in search TPS with *decrease* in elastic
>>> search heap memory size. Attached bigdesk images for both of the below
>>> cases!
>>>
>>> CASE1)
>>> When ess_heap size = 16gb
>>> Search tps observed is  50
>>>
>>> CASE2)
>>> When ess_heap_size=30gb
>>> Search tps observed is  18
>>>
>>> Surprising thing is as we decrease ess_heap_size the search tps got
>>> increased. All the resources(cpu,memory etc) are not fully utilized,  OS
>>> heap memory is not changed much, observed lot of zig zag ess_heap
>>> usage(increase and decrease of ess heap usage, may be because of high index
>>> size that need to be brought to RAM)  and reads I/Os  followed same zig zag
>>> manner in both the cases.
>>>
>>> Please note that we have run this experiment *multiple* times and
>>> observed the same pattern. Can you please guide on what is going wrong?
>>> what dec ess_heap increasing the tps, should we further decrease to achieve
>>> better tps or we doing something wrong?
>>>
>>> -Thanks
>>> Srikanth V.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/f1d1bf51-ab4d-4242-94e5-4b5c3adf466c%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8aea6643-0fe8-43e0-a698-8379e306d21e%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smReTN-1oScpc%2BU4OLGGd_hAQPLjR0YED%2BLbwp%2Bnbxp80Gw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Eliminate query terms based on their document frequency!

2015-02-18 Thread sri krishna
Hi,

How can we configure elastic search to eliminate query terms specific to 
field based on document frequency threshold?

for eg:  for query "*title*:test AND *title*:west AND *desc*:world AND 
*desc*:hello"
Assume document frequency threshold is set to 10 and few terms in the query 
i.e,* desc*:world and *title*:test have document frequency greater than 10, 
i.e query should be changed to "*title*:west AND *desc*:hello".

one approach is to query for each terms in the and based on retrieved 
document count eliminate such terms exceeding the given document frequency 
threshold, but this is not effective as it increases number of searches 
drastically! 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52efe61d-2e5d-40d3-9666-adf155496c33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Decreasing Heap Size Results in Better TPS, How can this happen??

2015-02-18 Thread sri krishna
They are couple of questions pertaining to above graphs

1) Yes the GC time taken is doubled, but the (frequency)rate of GC cycle is 
high(almost double/triple) in 16gb compared to 30gb. 

2) Why complete memory is not utilized before cleaning up the memory, for 
eg for 30gb, maximum usage is around 20gb, for 16gb it is 11gb.

3) This looks like thrashing!, is it because of large index size(76.6GB) on 
single host ?, 

PS: GC used here is G1GC


On Thursday, 19 February 2015 00:54:25 UTC+5:30, Jörg Prante wrote:
>
> So you believe in "the more heap the better"? This assumption you have 
> just proved wrong. Note, in your test, at 16GB heap, GC took half a second, 
> while at 30GB heap, GC took around a second (and maybe more overhead in 
> stop-the-world pauses). This is a hint about current JVM GC scalability of 
> Java 7.
>
> Jörg
>
> On Wed, Feb 18, 2015 at 7:45 PM, Srikanth Valiveti  > wrote:
>
>> We have 5 shards(give each size) totaling to 76.6GB in a single host 
>> c3.8xlarge  system (60gb ram, 
>> 32 core, 2*320 SSD )
>>
>> We have multiple fields in our record, but single field is ngram analyzed 
>> on which we search results for.  This search need to performed on all 5 
>> shards of the host to get results, as there is no routing in our case.
>>
>> We observed huge variations in search TPS with *decrease* in elastic 
>> search heap memory size. Attached bigdesk images for both of the below 
>> cases!
>>  
>> CASE1)
>> When ess_heap size = 16gb
>> Search tps observed is  50
>>
>> CASE2)
>> When ess_heap_size=30gb
>> Search tps observed is  18
>>
>> Surprising thing is as we decrease ess_heap_size the search tps got 
>> increased. All the resources(cpu,memory etc) are not fully utilized,  OS 
>> heap memory is not changed much, observed lot of zig zag ess_heap 
>> usage(increase and decrease of ess heap usage, may be because of high index 
>> size that need to be brought to RAM)  and reads I/Os  followed same zig zag 
>> manner in both the cases.
>>
>> Please note that we have run this experiment *multiple* times and 
>> observed the same pattern. Can you please guide on what is going wrong? 
>> what dec ess_heap increasing the tps, should we further decrease to achieve 
>> better tps or we doing something wrong?
>>
>> -Thanks
>> Srikanth V.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f1d1bf51-ab4d-4242-94e5-4b5c3adf466c%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8aea6643-0fe8-43e0-a698-8379e306d21e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Logstash agent and rsync

2015-02-18 Thread Greg Diamond
I am trying to use logstash to index a set of log files from our 
webservers.  The log files that I can access are stored on a network 
attached storage system which are copied over from the webservers using 
rsync.

Logstash agent ends up throwing this error:

{:timestamp=>"2015-02-18T13:10:45.19-0500", :message=>"A plugin had an 
unrecoverable error. Will restart this plugin.\n  Plugin: 
[\"/nas/logs/04.log\", \"/nas/logs/05.log\"], 
start_position=>\"end\", sincedb_path=>\"/nas/logs/sincedb/since\">\n  
Error: Unknown error - Stale NFS file handle", :level=>:error}

We have many servers each with their own log, so was hoping not to have to 
install logstash on every webserver individually to follow the live logs.

Is there any solution getting logstash agent and rsync working together?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/763b68a2-5c41-47bf-b0d4-ff60d60730c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Delete all documents in an Index - Using Python

2015-02-18 Thread Amay Patil
How can I delete all documents in an Index using Python.

Ap

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/09e75c24-1fba-4529-b3a6-d29392beefa0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cluster hanging on node failure

2015-02-18 Thread Max Charas


Hello all of you bright people,

We’re currently running a smallish 300 GB cluster in production on 5 nodes 
with around 30 mil docs. Everything works flawlessly except when a node 
really goes down (I mean like network/ HW failure/ kill -9).

When we lose a node the cluster becomes more or less completely 
unresponsive for a few minutes. Both regarding indexing and querying. This 
is of course, less than ideal as we have load 24/7.

I would really appreciate some help with understanding best practice 
settings to have a robust cluster. 

First goal for us is for the cluster to not become unresponsive in the 
event of a node crash. After reading everything I could find on the web I 
can't really understand if ES is designed to be unresponsive for 
ping_retries*ping_timeout seconds or if the cluster will continue to server 
query requests even during this time. Could anyone help me shed light on 
this?

Secondly in the event of a even worse failure where the cluster goes into 
red state, would it be possible to allow the cluster to still serve 
read/query requests? 

I would be ever so grateful for anyone willing to help me understand how 
this works or what we would need to change to make our ES installation more 
robust.

I’ve included our config here:

cluster.name: clustername

node.name: nodename

path.data: /index

node.master: true

node.data: true

discovery.zen.minimum_master_nodes: 3

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.multicast.ping.enabled: false

discovery.zen.ping.unicast.enabled: true

discovery.zen.ping.unicast.hosts: ["host1","host2","host3"]

bootstrap.mlockall: true

index.number_of_shards: 10

action.disable_delete_all_indices: true

marvel.agent.exporter.es.hosts: ["marvel:9200"]

 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb1d307b-8c00-469d-81fb-8067942d02ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Decreasing Heap Size Results in Better TPS, How can this happen??

2015-02-18 Thread joergpra...@gmail.com
So you believe in "the more heap the better"? This assumption you have just
proved wrong. Note, in your test, at 16GB heap, GC took half a second,
while at 30GB heap, GC took around a second (and maybe more overhead in
stop-the-world pauses). This is a hint about current JVM GC scalability of
Java 7.

Jörg

On Wed, Feb 18, 2015 at 7:45 PM, Srikanth Valiveti <
vsrikanth.chi...@gmail.com> wrote:

> We have 5 shards(give each size) totaling to 76.6GB in a single host
> c3.8xlarge  system (60gb ram,
> 32 core, 2*320 SSD )
>
> We have multiple fields in our record, but single field is ngram analyzed
> on which we search results for.  This search need to performed on all 5
> shards of the host to get results, as there is no routing in our case.
>
> We observed huge variations in search TPS with *decrease* in elastic
> search heap memory size. Attached bigdesk images for both of the below
> cases!
>
> CASE1)
> When ess_heap size = 16gb
> Search tps observed is  50
>
> CASE2)
> When ess_heap_size=30gb
> Search tps observed is  18
>
> Surprising thing is as we decrease ess_heap_size the search tps got
> increased. All the resources(cpu,memory etc) are not fully utilized,  OS
> heap memory is not changed much, observed lot of zig zag ess_heap
> usage(increase and decrease of ess heap usage, may be because of high index
> size that need to be brought to RAM)  and reads I/Os  followed same zig zag
> manner in both the cases.
>
> Please note that we have run this experiment *multiple* times and
> observed the same pattern. Can you please guide on what is going wrong?
> what dec ess_heap increasing the tps, should we further decrease to achieve
> better tps or we doing something wrong?
>
> -Thanks
> Srikanth V.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f1d1bf51-ab4d-4242-94e5-4b5c3adf466c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEpT8rNM5%2BosX%3D2%2B70RF-A12ujT2bw3ak4LcH5aQhCA%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How to simulate "nested_filter" in sort when not having nested types?

2015-02-18 Thread Roxana Balaci
Hi,

I am having this simplified mapping:
{
   "someting": {
  "mappings": {
 "test": {
"properties": {
   "abc": {
  "type": "string"
   },
   "titles": {
  "properties": {
 "count": {
"type": "string"
 },
 "counta": {
"type": "long"
 },
 "date1": {
"type": "date",
"format": "dateOptionalTime"
 }
  }
   }
}
 }
  }
   }
}

So, there is an array, but not a nested type.

How can I make without using scripts something similar to what exists for 
nested types:
GET someting/test/_search
{
  "query": {
"match_all": {}
  },
*  "sort": [*
{
*  "titles.counta": {*
*"order": "asc",*
*"mode": "sum",*
   * "nested_filter": {*
  "range": {
   * "titles.date1": {*
*  "gte": "2015-01-01"*
}
  }
}
  }
}
  ]
}

I am interested in taking for sorting the counta that have a date greater 
than the one I give as parameter.

Thank you,
Roxana

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b4eb59b9-636b-4471-83d9-42f219d1960d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Read past EOF exception on .tis and .fdt file

2015-02-18 Thread Michael McCandless
ES has the index.shard.check_on_startup to run CheckIndex on startup of a
shard:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

Mike McCandless

http://blog.mikemccandless.com

On Wed, Feb 18, 2015 at 1:17 PM, Jilles van Gurp 
wrote:

> plus 1 for a less invasive way to recover data
>
> I had a similar issue today on one of our test servers where I eventually
> managed to recover my index by running CheckIndex on one of my shards. In
> my case, I also had to remove the translog recovery file to actually get
> the cluster green. This is one of those steps that seems to be omitted in
> most mentions of the CheckIndex tool in combination with ElasticSearch.
>
> Anyway, after this, I ran CheckIndex on some other shards that were
> supposedly fine and was a bit surprised when it actually reported and fixed
> some errors there too.
>
> This makes me wonder if there should be a proper API around this tool in
> elasticsearch that allows you to run proper corruption checks on the whole
> cluster and fix problems. It would be nice if you could run some
> diagnostics to confirm your data is actually 100% OK. I know elasticsearch
> has increasingly more checks that run on startup involving checksums, etc.
> But it also seems those checks failed to detect problems that CheckIndex
> seems to think need fixing. That sounds like something most admins would
> like to know about their cluster.
>
> On Thursday, February 12, 2015 at 10:44:26 AM UTC+1, Philipp Knobel wrote:
>>
>> Hi all,
>>
>> we recently had an issue with ES that it reported a file corruption (more
>> specifically a read past EOF error) after some imports/deletion for a
>> longer timeframe. ES reported on a few nodes a long garbage collection
>> time, but then was silent again until it started to show the EOF exception.
>> From what I could find on the internet this kind of exception can happen if
>> an OutOfMemory error is happening or no space on disk is left. Both did not
>> occur in our scenario. I don't understand how this could happen in the
>> first place. We're running ES 1.3.4 and the migrated a while ago from 0.20.
>>
>> *[2015-02-06 01:15:11.971 GMT] INFO ||
>> elasticsearch[3-6][scheduler][T#1] org.elasticsearch.monitor.jvm  [3-6]
>> [gc][young][618719][105280] duration [962ms], collections [1]/[1.6s], total
>> [962ms]/[16.8m], memory [435.2mb]->[425.9mb]/[1.9gb], all_pools {[young]
>> [28.2mb]->[5.3mb]/[546.1mb]}{[survivor] [6.3mb]->[6.3mb]/[68.2mb]}{[old]
>> [400.5mb]->[414.2mb]/[1.3gb]}*
>> *[2015-02-06 07:20:44.188 GMT] WARN || elasticsearch[3-6][[order][3]:
>> Lucene Merge Thread #17] org.elasticsearch.index.merge.scheduler  [3-6]
>> [order][3] failed to merge*
>> *java.io.EOFException: read past EOF:
>> NIOFSIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")*
>> *  at
>> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:144)*
>> *  at
>> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)*
>> *  at
>> org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
>> *  at
>> org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
>> *  at
>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
>> *  at
>> org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:389)*
>> *  at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
>> *  at
>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:355)*
>> *  at
>> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)*
>> *  at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)*
>> *  at
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4225)*
>> *  at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3820)*
>> *  at
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)*
>> *  at
>> org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)*
>> *  at
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)*
>>
>> We ran a checkIndex and it reported that for this .fdt file and the
>> corresponding *.tis* file a read past EOF exception was discovered.
>>
>> *  2 of 29: name=_dr3z docCount=575018*
>> *codec=Lucene3x*
>> *compound=false*
>> *numFiles=11*
>> *size (MB)=512.496*
>> *diagnostics = {os=Linux, os.version=3.1.6, mergeFactor=10,
>> source=merge, lucene.version=3.6.2 1423725 - rmuir - 2012-12-18 19:45:40,
>> os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51,
>> java.vendor=Oracle Corporation}*
>> *has deletions [delGen=422]*
>> *test: open reader.OK*
>> *test: check integrity.OK*
>> *test: check live do

Re: Read past EOF exception on .tis and .fdt file

2015-02-18 Thread Jilles van Gurp
plus 1 for a less invasive way to recover data

I had a similar issue today on one of our test servers where I eventually 
managed to recover my index by running CheckIndex on one of my shards. In 
my case, I also had to remove the translog recovery file to actually get 
the cluster green. This is one of those steps that seems to be omitted in 
most mentions of the CheckIndex tool in combination with ElasticSearch.

Anyway, after this, I ran CheckIndex on some other shards that were 
supposedly fine and was a bit surprised when it actually reported and fixed 
some errors there too.

This makes me wonder if there should be a proper API around this tool in 
elasticsearch that allows you to run proper corruption checks on the whole 
cluster and fix problems. It would be nice if you could run some 
diagnostics to confirm your data is actually 100% OK. I know elasticsearch 
has increasingly more checks that run on startup involving checksums, etc. 
But it also seems those checks failed to detect problems that CheckIndex 
seems to think need fixing. That sounds like something most admins would 
like to know about their cluster.

On Thursday, February 12, 2015 at 10:44:26 AM UTC+1, Philipp Knobel wrote:
>
> Hi all,
>
> we recently had an issue with ES that it reported a file corruption (more 
> specifically a read past EOF error) after some imports/deletion for a 
> longer timeframe. ES reported on a few nodes a long garbage collection 
> time, but then was silent again until it started to show the EOF exception. 
> From what I could find on the internet this kind of exception can happen if 
> an OutOfMemory error is happening or no space on disk is left. Both did not 
> occur in our scenario. I don't understand how this could happen in the 
> first place. We're running ES 1.3.4 and the migrated a while ago from 0.20.
>
> *[2015-02-06 01:15:11.971 GMT] INFO || 
> elasticsearch[3-6][scheduler][T#1] org.elasticsearch.monitor.jvm  [3-6] 
> [gc][young][618719][105280] duration [962ms], collections [1]/[1.6s], total 
> [962ms]/[16.8m], memory [435.2mb]->[425.9mb]/[1.9gb], all_pools {[young] 
> [28.2mb]->[5.3mb]/[546.1mb]}{[survivor] [6.3mb]->[6.3mb]/[68.2mb]}{[old] 
> [400.5mb]->[414.2mb]/[1.3gb]}*
> *[2015-02-06 07:20:44.188 GMT] WARN || elasticsearch[3-6][[order][3]: 
> Lucene Merge Thread #17] org.elasticsearch.index.merge.scheduler  [3-6] 
> [order][3] failed to merge*
> *java.io.EOFException: read past EOF: 
> NIOFSIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")*
> *  at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:144)*
> *  at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)*
> *  at 
> org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
> *  at 
> org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
> *  at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
> *  at 
> org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:389)*
> *  at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
> *  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:355)*
> *  at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)*
> *  at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)*
> *  at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4225)*
> *  at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3820)*
> *  at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)*
> *  at 
> org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)*
> *  at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)*
>
> We ran a checkIndex and it reported that for this .fdt file and the 
> corresponding *.tis* file a read past EOF exception was discovered.
>
> *  2 of 29: name=_dr3z docCount=575018*
> *codec=Lucene3x*
> *compound=false*
> *numFiles=11*
> *size (MB)=512.496*
> *diagnostics = {os=Linux, os.version=3.1.6, mergeFactor=10, 
> source=merge, lucene.version=3.6.2 1423725 - rmuir - 2012-12-18 19:45:40, 
> os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51, 
> java.vendor=Oracle Corporation}*
> *has deletions [delGen=422]*
> *test: open reader.OK*
> *test: check integrity.OK*
> *test: check live docs.OK [419388 deleted docs]*
> *test: fields..OK [132 fields]*
> *test: field norms.OK [48 fields]*
> *test: terms, freq, prox...ERROR: java.io.EOFException: seek past EOF: 
> MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")*
> *java.io.EOFException: seek past EOF: 
> MMapInd

connection problem to elasticsearch

2015-02-18 Thread Ali Lotfdar
Hello All,

I am completely confused in managing and connection elasticsearch. Thanks a 
lot to help me solve my problem.

-I started to create some Index inside ElasticSearch but even when I delete 
them they return! I found that it is because they are still available in 
other nodes! Thanks to let me know how i can delete the completely?
-Also I have to open two nodes to be able to work with ElasticSearch 
otherwise I encounter with 'no known master node '!
-When I had a problem and I close a node, nothing will work till I restart 
my computer!
-Also all my index status are red!

I just explain the status to inform you about the condition. Thank you to 
help me to get rid of this dilemma! 


Just please consider that I am new user of Elasticsearch and apologize for 
such question!  


[2015-02-18 12:44:59,846][INFO ][node ] [Helleyes] 
version[1.4.2], pid[483], build[927caff/2014-12-16T14:11:12Z]

[2015-02-18 12:44:59,847][INFO ][node ] [Helleyes] 
initializing ...

[2015-02-18 12:44:59,861][INFO ][plugins  ] [Helleyes] 
loaded [marvel], sites [marvel]

[2015-02-18 12:45:02,745][INFO ][node ] [Helleyes] 
initialized

[2015-02-18 12:45:02,746][INFO ][node ] [Helleyes] 
starting ...

[2015-02-18 12:45:02,811][INFO ][transport] [Helleyes] 
bound_address {inet[/0:0:0:0:0:0:0:0:9305]}, publish_address 
{inet[/192.168.110.133:9305]}

[2015-02-18 12:45:02,827][INFO ][discovery] [Helleyes] 
elasticsearch/boK22-mkRPGMo-1TSnXhZw

[2015-02-18 12:45:06,599][INFO ][cluster.service  ] [Helleyes] 
new_master 
[Helleyes][boK22-mkRPGMo-1TSnXhZw][webdev99s-Mac-mini.local][inet[/192.168.110.133:9305]],
 
reason: zen-disco-join (elected_as_master)

[2015-02-18 12:45:06,708][INFO ][http ] [Helleyes] 
bound_address {inet[/0:0:0:0:0:0:0:0:9205]}, publish_address 
{inet[/192.168.110.133:9205]}

[2015-02-18 12:45:06,709][INFO ][node ] [Helleyes] 
started

[2015-02-18 12:45:07,594][INFO ][discovery.zen] [Helleyes] 
updating discovery.zen.minimum_master_nodes from [-1] to [2]

[2015-02-18 12:45:07,710][INFO ][gateway  ] [Helleyes] 
recovered [23] indices into cluster_state

[2015-02-18 12:45:07,711][WARN ][discovery.zen] [Helleyes] not 
enough master nodes on change of minimum_master_nodes from [-1] to [2], 
current nodes: 
{[Helleyes][boK22-mkRPGMo-1TSnXhZw][webdev99s-Mac-mini.local][inet[/192.168.110.133:9305]],}

[2015-02-18 12:45:07,719][ERROR][cluster.action.shard ] [Helleyes] 
unexpected failure during [shard-started ([logstash-2015.02.05][0], 
node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery 
from gateway]]

org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: no 
longer master. source: [shard-started ([logstash-2015.02.05][0], 
node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery 
from gateway]]

at 
org.elasticsearch.cluster.ClusterStateUpdateTask.onNoLongerMaster(ClusterStateUpdateTask.java:53)

at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:324)

at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2015-02-18 12:45:07,722][ERROR][cluster.action.shard ] [Helleyes] 
unexpected failure during [shard-started ([test3][2], 
node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery 
from gateway]]

org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: no 
longer master. source: [shard-started ([test3][2], 
node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]), reason [after recovery 
from gateway]]

at 
org.elasticsearch.cluster.ClusterStateUpdateTask.onNoLongerMaster(ClusterStateUpdateTask.java:53)

at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:324)

at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2015-02-18 12:45:08,302][WARN ][cluster.action.shard ] [Helleyes] 
can't send shard started for [.marvel-2015.02.13][0], 
node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]. no master known.

[2015-02-18 12:45:10,771][WARN ][cluster.action.shard ] [Helleyes] 
can't send shard started for [.marvel-2015.02.18][0], 
node[boK22-mkRPGMo-1TSnXhZw], [P], s[INITIALIZING]. no mas

Re: [Hadoop][Spark] Exclude metadata fields from _source

2015-02-18 Thread Costin Leau
Hi Itay,

Sorry I missed your email. I'm not clear from your post how your documents
look like - can you post a gist somewhere with your JSON input that you are
sending to Elasticsearch?
Typically the metadata appear in the _source if they are declared that way.
You should be able to go around this by using:
1. es.mapping.exclude - if it doesn't seem to be working
2. in case of Spark, by specifying the metadata through the `saveWithMeta`
methods which allows it to stay decoupled from the object itself.

Since you are using JSON likely 1 is your best shot. If it doesn't work for
you can you please raise an issue with a quick/small sample to be able to
reproduce it?

Thanks,


On Wed, Feb 18, 2015 at 10:27 AM, Itai Yaffe  wrote:

> Hey,
> Has anyone experienced with such an issue?
> Perhaps Costin can help here?
>
> Thanks!
>
> On Thursday, February 12, 2015 at 8:27:14 AM UTC+2, Itai Yaffe wrote:
>
>> Hey,
>> I've recently started using Elasticsearch for Spark (Scala application).
>> I've added elasticsearch-spark_2.10 version 2.1.0.BUILD-SNAPSHOT to my
>> Spark application pom file, and used 
>> org.apache.spark.rdd.RDD[String].saveJsonToEs()
>> to send documents to Elasticsearch.
>> When the documents are loaded to Elasticsearch, my metadata fields (e.g
>> id, index, etc.) are being loaded as part of the _source field.
>> Is there a way to exclude them from the _source?
>> I've tried using the new "es.mapping.exclude" configuration property
>> (added in this commit
>> 
>> - that's why I needed to take the latest build rather than using version
>> 2.1.0.Beta3), but it doesn't seem to have any affect (although I'm not sure
>> it's even possible to exclude fields I'm using for mapping, e.g "
>> es.mapping.id").
>>
>> A code snippet (I'm using a single-node Elasticsearch cluster for testing
>> purposes and running the Spark app from my desktop) :
>> val conf = new SparkConf()...
>> conf.set("es.index.auto.create", "false")
>> conf.set("es.nodes.discovery", "false")
>> conf.set("es.nodes", "XXX:9200")
>> conf.set("es.update.script", "XXX")
>> conf.set("es.update.script.params", "param1:events")
>> conf.set("es.update.retry.on.conflict" , "2")
>> conf.set("es.write.operation", "upsert")
>> conf.set("es.input.json", "true")
>> val documentsRdd =  ...
>> documentsRdd.saveJsonToEs("test/user", scala.collection.Map("es.
>> mapping.id" -> "_id", "es.mapping.exclude" -> "_id"))
>>
>> The JSON looks like that :
>> {
>>   "_id": "",
>>   "_type": "user",
>>   "_index": "test",
>>   "params": {
>> "events": [
>>   {
>> ...
>>   }
>> ]
>>   }
>>
>> Thanks!
>> }
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/aea88dfb-8d4b-49d1-a236-8de6d513b4f6%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmcAmUF2txP_6_DGoW9%3DN7kKKx3gkCaeDBohFmjC8PvtNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: One shard continually fails to allocate

2015-02-18 Thread Aaron C. de Bruyn
*grumble*
That's it.
Two of the nodes are FreeBSD, the other two are Linux.
It appears the two Linux nodes 'magically' updated themselves to 1.4.3...

Thanks for the help.

-A

On Wed, Feb 18, 2015 at 9:06 AM, Todd Nine  wrote:
> Hey Aaron,
>   What do you get back if you try to use these sets of commands to manually
> allocate the shard to a node?
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-reroute.html
>
> I had this problem before, but it turned out we had 1 node that had
> accidentally be upgraded, and the rest were still on a previous version.   I
> was able to determine this be reading the error output from the shard
> allocation command.
>
>
> Todd
>
> On Tuesday, February 17, 2015 at 11:48:48 PM UTC-8, David Pilato wrote:
>>
>> What gives
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat-shards.html#cat-shards
>> ?
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>> Le 18 févr. 2015 à 06:44, Aaron C. de Bruyn  a écrit :
>>
>> All the servers have nearly 1 TB free space.
>>
>> -A
>>
>> On Tue, Feb 17, 2015 at 7:44 PM, David Pilato  wrote:
>>
>> It's a replica?
>>
>> Might be because you are running low on disk space?
>>
>>
>> --
>>
>> David ;-)
>>
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>>
>> Le 18 févr. 2015 à 01:16, Aaron de Bruyn  a écrit :
>>
>>
>> I have one shard that continually fails to allocate.
>>
>>
>> There is nothing in the logs that would seem to indicate a problem on any
>> of
>>
>> the servers.
>>
>>
>> The pattern of one of the copies of shard '2' not being allocated runs
>>
>> throughout all my logstash indexes.
>>
>>
>> Running 1.4.3 on all nodes.
>>
>>
>> Any pointers on what I should check?
>>
>>
>> Thanks,
>>
>>
>> -A
>>
>>
>> --
>>
>> You received this message because you are subscribed to the Google Groups
>>
>> "elasticsearch" group.
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>>
>> email to elasticsearc...@googlegroups.com.
>>
>> To view this discussion on the web visit
>>
>>
>> https://groups.google.com/d/msgid/elasticsearch/e1a6111e-70a3-412a-a666-da61c479ee53%40googlegroups.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> 
>>
>>
>> --
>>
>> You received this message because you are subscribed to a topic in the
>>
>> Google Groups "elasticsearch" group.
>>
>> To unsubscribe from this topic, visit
>>
>> https://groups.google.com/d/topic/elasticsearch/iB--QW6ds-Y/unsubscribe.
>>
>> To unsubscribe from this group and all its topics, send an email to
>>
>> elasticsearc...@googlegroups.com.
>>
>> To view this discussion on the web visit
>>
>>
>> https://groups.google.com/d/msgid/elasticsearch/E00B59E6-1D1C-4727-AD0F-ABA5291D0E56%40pilato.fr.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAEE%2BrGqur9RWW%2BorDGDCji9cUvp7Y2XmcT8D4M%3DCLx7%2BkiWofg%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/iB--QW6ds-Y/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/91d7e258-6886-4344-b990-5ca4d0b2888c%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEE%2BrGr9Gd6p3n_TC_gTg4yXo1tdK%2B_EKZzD-5t8-y5gjX4RzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: One shard continually fails to allocate

2015-02-18 Thread Todd Nine
Hey Aaron,
  What do you get back if you try to use these sets of commands to manually 
allocate the shard to a node?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-reroute.html

I had this problem before, but it turned out we had 1 node that had 
accidentally be upgraded, and the rest were still on a previous version.   
I was able to determine this be reading the error output from the shard 
allocation command. 


Todd

On Tuesday, February 17, 2015 at 11:48:48 PM UTC-8, David Pilato wrote:
>
> What gives 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat-shards.html#cat-shards
> ?
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 18 févr. 2015 à 06:44, Aaron C. de Bruyn  > a écrit :
>
> All the servers have nearly 1 TB free space.
>
> -A
>
> On Tue, Feb 17, 2015 at 7:44 PM, David Pilato  > wrote:
>
> It's a replica?
>
> Might be because you are running low on disk space?
>
>
> --
>
> David ;-)
>
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 18 févr. 2015 à 01:16, Aaron de Bruyn > 
> a écrit :
>
>
> I have one shard that continually fails to allocate.
>
>
> There is nothing in the logs that would seem to indicate a problem on any 
> of
>
> the servers.
>
>
> The pattern of one of the copies of shard '2' not being allocated runs
>
> throughout all my logstash indexes.
>
>
> Running 1.4.3 on all nodes.
>
>
> Any pointers on what I should check?
>
>
> Thanks,
>
>
> -A
>
>
> --
>
> You received this message because you are subscribed to the Google Groups
>
> "elasticsearch" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an
>
> email to elasticsearc...@googlegroups.com .
>
> To view this discussion on the web visit
>
>
> https://groups.google.com/d/msgid/elasticsearch/e1a6111e-70a3-412a-a666-da61c479ee53%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> 
>
>
> --
>
> You received this message because you are subscribed to a topic in the
>
> Google Groups "elasticsearch" group.
>
> To unsubscribe from this topic, visit
>
> https://groups.google.com/d/topic/elasticsearch/iB--QW6ds-Y/unsubscribe.
>
> To unsubscribe from this group and all its topics, send an email to
>
> elasticsearc...@googlegroups.com .
>
> To view this discussion on the web visit
>
>
> https://groups.google.com/d/msgid/elasticsearch/E00B59E6-1D1C-4727-AD0F-ABA5291D0E56%40pilato.fr
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAEE%2BrGqur9RWW%2BorDGDCji9cUvp7Y2XmcT8D4M%3DCLx7%2BkiWofg%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91d7e258-6886-4344-b990-5ca4d0b2888c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: One shard continually fails to allocate

2015-02-18 Thread Aaron C. de Bruyn
Nada.  ;)

root@tetrad:~# curl 'localhost:9200/_cat/pending_tasks?v'
insertOrder timeInQueue priority source
root@tetrad:~#

-A

On Wed, Feb 18, 2015 at 8:59 AM, David Pilato  wrote:
> And this?
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat-pending-tasks.html
>
> --
> David Pilato | Technical Advocate | Elasticsearch.com
> @dadoonet | @elasticsearchfr | @scrutmydocs
>
>
>
> Le 18 févr. 2015 à 17:44, Aaron C. de Bruyn  a écrit :
>
> I did some playing last night, but was unable to figure it out.
> Looking at the shards this morning gives me this:
>
> root@tetrad:~# curl -s '127.0.0.1:9200/_cat/shards?v'
> index   shard prirep state docs   store ipnode
> intranet2 p  STARTED 68  56.9kb 127.0.0.1 escorp
> intranet2 r  STARTED 68  56.9kb 127.0.0.1
> backupnas1
> intranet2 r  STARTED 68  56.9kb 127.0.1.1 tetrad
> intranet0 r  STARTED 62  43.3kb 127.0.0.1 escorp
> intranet0 p  STARTED 62  43.3kb 127.0.0.1
> backupnas1
> intranet0 r  STARTED 62  43.3kb 127.0.1.1 tetrad
> intranet3 p  STARTED 66  47.8kb 127.0.0.1 escorp
> intranet3 r  STARTED 66  47.8kb 127.0.0.1
> backupnas1
> intranet3 r  STARTED 66  47.8kb 127.0.1.1 tetrad
> intranet1 p  STARTED 69  58.1kb 127.0.0.1 escorp
> intranet1 r  STARTED 69  58.1kb 127.0.0.1
> backupnas1
> intranet1 r  STARTED 69  55.2kb 127.0.1.1 tetrad
> intranet4 p  STARTED 64  43.9kb 127.0.0.1 escorp
> intranet4 r  STARTED 64  46.8kb 127.0.0.1
> backupnas1
> intranet4 r  STARTED 64  43.9kb 127.0.1.1 tetrad
> logstash-2015.02.09 4 p  STARTED  0115b 127.0.1.1 tetrad
> logstash-2015.02.09 4 r  UNASSIGNED
> logstash-2015.02.09 0 r  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.09 0 p  STARTED  0115b 127.0.0.1
> backupnas1
> logstash-2015.02.09 3 p  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.09 3 r  STARTED  0115b 127.0.1.1 tetrad
> logstash-2015.02.09 1 p  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.09 1 r  STARTED  0115b 127.0.0.1
> backupnas1
> logstash-2015.02.09 2 r  STARTED  1   7.4kb 127.0.0.1 escorp
> logstash-2015.02.09 2 p  STARTED  1   7.4kb 127.0.0.1
> backupnas1
> logstash-2015.02.16 4 p  STARTED2538505   774mb 127.0.1.1 tetrad
> logstash-2015.02.16 0 p  STARTED3168221   1.1gb 127.0.0.1
> backupnas1
> logstash-2015.02.16 3 p  STARTED3171176   1.1gb 127.0.0.1
> backupnas1
> logstash-2015.02.16 1 p  STARTED2543041 773.9mb 127.0.1.1 tetrad
> logstash-2015.02.16 2 p  STARTED3169607   1.1gb 127.0.0.1 escorp
> logstash-2015.02.07 2 p  STARTED  0115b 127.0.0.1
> backupnas1
> logstash-2015.02.07 2 r  STARTED  0115b 127.0.1.1 tetrad
> logstash-2015.02.07 0 r  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.07 0 p  STARTED  0115b 127.0.0.1
> backupnas1
> logstash-2015.02.07 3 p  STARTED  1   7.4kb 127.0.0.1 escorp
> logstash-2015.02.07 3 r  STARTED  1   7.4kb 127.0.1.1 tetrad
> logstash-2015.02.07 1 p  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.07 1 r  STARTED  0115b 127.0.0.1
> backupnas1
> logstash-2015.02.07 4 r  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.07 4 p  STARTED  0115b 127.0.0.1
> backupnas1
> logstash2 r  STARTED  4  45.8kb 127.0.0.1 escorp
> logstash2 p  STARTED  4  45.8kb 127.0.0.1
> backupnas1
> logstash2 r  STARTED  4  45.8kb 127.0.1.1 tetrad
> logstash0 p  STARTED  1  10.9kb 127.0.0.1 escorp
> logstash0 r  STARTED  1  10.9kb 127.0.0.1
> backupnas1
> logstash0 r  STARTED  1  10.9kb 127.0.1.1 tetrad
> logstash3 p  STARTED  0115b 127.0.0.1 escorp
> logstash3 r  STARTED  0115b 127.0.0.1
> backupnas1
> logstash3 r  STARTED  0115b 127.0.1.1 tetrad
> logstash1 r  STARTED 10  66.5kb 127.0.0.1 escorp
> logstash1 p  STARTED 10  66.5kb 127.0.0.1
> backupnas1
> logstash1 r  STARTED 10  66.5kb 127.0.1.1 tetrad
> logstash4 p  STARTED  2  25.3kb 127.0.0.1 escorp
> logstash

Re: One shard continually fails to allocate

2015-02-18 Thread David Pilato
And this? 

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat-pending-tasks.html
 

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 18 févr. 2015 à 17:44, Aaron C. de Bruyn  a écrit :
> 
> I did some playing last night, but was unable to figure it out.
> Looking at the shards this morning gives me this:
> 
> root@tetrad:~# curl -s '127.0.0.1:9200/_cat/shards?v'
> index   shard prirep state docs   store ipnode
> intranet2 p  STARTED 68  56.9kb 127.0.0.1 escorp
> intranet2 r  STARTED 68  56.9kb 127.0.0.1 
> backupnas1
> intranet2 r  STARTED 68  56.9kb 127.0.1.1 tetrad
> intranet0 r  STARTED 62  43.3kb 127.0.0.1 escorp
> intranet0 p  STARTED 62  43.3kb 127.0.0.1 
> backupnas1
> intranet0 r  STARTED 62  43.3kb 127.0.1.1 tetrad
> intranet3 p  STARTED 66  47.8kb 127.0.0.1 escorp
> intranet3 r  STARTED 66  47.8kb 127.0.0.1 
> backupnas1
> intranet3 r  STARTED 66  47.8kb 127.0.1.1 tetrad
> intranet1 p  STARTED 69  58.1kb 127.0.0.1 escorp
> intranet1 r  STARTED 69  58.1kb 127.0.0.1 
> backupnas1
> intranet1 r  STARTED 69  55.2kb 127.0.1.1 tetrad
> intranet4 p  STARTED 64  43.9kb 127.0.0.1 escorp
> intranet4 r  STARTED 64  46.8kb 127.0.0.1 
> backupnas1
> intranet4 r  STARTED 64  43.9kb 127.0.1.1 tetrad
> logstash-2015.02.09 4 p  STARTED  0115b 127.0.1.1 tetrad
> logstash-2015.02.09 4 r  UNASSIGNED
> logstash-2015.02.09 0 r  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.09 0 p  STARTED  0115b 127.0.0.1 
> backupnas1
> logstash-2015.02.09 3 p  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.09 3 r  STARTED  0115b 127.0.1.1 tetrad
> logstash-2015.02.09 1 p  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.09 1 r  STARTED  0115b 127.0.0.1 
> backupnas1
> logstash-2015.02.09 2 r  STARTED  1   7.4kb 127.0.0.1 escorp
> logstash-2015.02.09 2 p  STARTED  1   7.4kb 127.0.0.1 
> backupnas1
> logstash-2015.02.16 4 p  STARTED2538505   774mb 127.0.1.1 tetrad
> logstash-2015.02.16 0 p  STARTED3168221   1.1gb 127.0.0.1 
> backupnas1
> logstash-2015.02.16 3 p  STARTED3171176   1.1gb 127.0.0.1 
> backupnas1
> logstash-2015.02.16 1 p  STARTED2543041 773.9mb 127.0.1.1 tetrad
> logstash-2015.02.16 2 p  STARTED3169607   1.1gb 127.0.0.1 escorp
> logstash-2015.02.07 2 p  STARTED  0115b 127.0.0.1 
> backupnas1
> logstash-2015.02.07 2 r  STARTED  0115b 127.0.1.1 tetrad
> logstash-2015.02.07 0 r  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.07 0 p  STARTED  0115b 127.0.0.1 
> backupnas1
> logstash-2015.02.07 3 p  STARTED  1   7.4kb 127.0.0.1 escorp
> logstash-2015.02.07 3 r  STARTED  1   7.4kb 127.0.1.1 tetrad
> logstash-2015.02.07 1 p  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.07 1 r  STARTED  0115b 127.0.0.1 
> backupnas1
> logstash-2015.02.07 4 r  STARTED  0115b 127.0.0.1 escorp
> logstash-2015.02.07 4 p  STARTED  0115b 127.0.0.1 
> backupnas1
> logstash2 r  STARTED  4  45.8kb 127.0.0.1 escorp
> logstash2 p  STARTED  4  45.8kb 127.0.0.1 
> backupnas1
> logstash2 r  STARTED  4  45.8kb 127.0.1.1 tetrad
> logstash0 p  STARTED  1  10.9kb 127.0.0.1 escorp
> logstash0 r  STARTED  1  10.9kb 127.0.0.1 
> backupnas1
> logstash0 r  STARTED  1  10.9kb 127.0.1.1 tetrad
> logstash3 p  STARTED  0115b 127.0.0.1 escorp
> logstash3 r  STARTED  0115b 127.0.0.1 
> backupnas1
> logstash3 r  STARTED  0115b 127.0.1.1 tetrad
> logstash1 r  STARTED 10  66.5kb 127.0.0.1 escorp
> logstash1 p  STARTED 10  66.5kb 127.0.0.1 
> backupnas1
> logstash1 r  STARTED 10  66.5kb 127.0.1.1 tetrad
> logstash4 p  STARTED  2  25.3kb 127.0.0.1 

Re: One shard continually fails to allocate

2015-02-18 Thread Aaron C. de Bruyn
I did some playing last night, but was unable to figure it out.
Looking at the shards this morning gives me this:

root@tetrad:~# curl -s '127.0.0.1:9200/_cat/shards?v'
index   shard prirep state docs   store ipnode
intranet2 p  STARTED 68  56.9kb 127.0.0.1 escorp
intranet2 r  STARTED 68  56.9kb 127.0.0.1 backupnas1
intranet2 r  STARTED 68  56.9kb 127.0.1.1 tetrad
intranet0 r  STARTED 62  43.3kb 127.0.0.1 escorp
intranet0 p  STARTED 62  43.3kb 127.0.0.1 backupnas1
intranet0 r  STARTED 62  43.3kb 127.0.1.1 tetrad
intranet3 p  STARTED 66  47.8kb 127.0.0.1 escorp
intranet3 r  STARTED 66  47.8kb 127.0.0.1 backupnas1
intranet3 r  STARTED 66  47.8kb 127.0.1.1 tetrad
intranet1 p  STARTED 69  58.1kb 127.0.0.1 escorp
intranet1 r  STARTED 69  58.1kb 127.0.0.1 backupnas1
intranet1 r  STARTED 69  55.2kb 127.0.1.1 tetrad
intranet4 p  STARTED 64  43.9kb 127.0.0.1 escorp
intranet4 r  STARTED 64  46.8kb 127.0.0.1 backupnas1
intranet4 r  STARTED 64  43.9kb 127.0.1.1 tetrad
logstash-2015.02.09 4 p  STARTED  0115b 127.0.1.1 tetrad
logstash-2015.02.09 4 r  UNASSIGNED
logstash-2015.02.09 0 r  STARTED  0115b 127.0.0.1 escorp
logstash-2015.02.09 0 p  STARTED  0115b 127.0.0.1 backupnas1
logstash-2015.02.09 3 p  STARTED  0115b 127.0.0.1 escorp
logstash-2015.02.09 3 r  STARTED  0115b 127.0.1.1 tetrad
logstash-2015.02.09 1 p  STARTED  0115b 127.0.0.1 escorp
logstash-2015.02.09 1 r  STARTED  0115b 127.0.0.1 backupnas1
logstash-2015.02.09 2 r  STARTED  1   7.4kb 127.0.0.1 escorp
logstash-2015.02.09 2 p  STARTED  1   7.4kb 127.0.0.1 backupnas1
logstash-2015.02.16 4 p  STARTED2538505   774mb 127.0.1.1 tetrad
logstash-2015.02.16 0 p  STARTED3168221   1.1gb 127.0.0.1 backupnas1
logstash-2015.02.16 3 p  STARTED3171176   1.1gb 127.0.0.1 backupnas1
logstash-2015.02.16 1 p  STARTED2543041 773.9mb 127.0.1.1 tetrad
logstash-2015.02.16 2 p  STARTED3169607   1.1gb 127.0.0.1 escorp
logstash-2015.02.07 2 p  STARTED  0115b 127.0.0.1 backupnas1
logstash-2015.02.07 2 r  STARTED  0115b 127.0.1.1 tetrad
logstash-2015.02.07 0 r  STARTED  0115b 127.0.0.1 escorp
logstash-2015.02.07 0 p  STARTED  0115b 127.0.0.1 backupnas1
logstash-2015.02.07 3 p  STARTED  1   7.4kb 127.0.0.1 escorp
logstash-2015.02.07 3 r  STARTED  1   7.4kb 127.0.1.1 tetrad
logstash-2015.02.07 1 p  STARTED  0115b 127.0.0.1 escorp
logstash-2015.02.07 1 r  STARTED  0115b 127.0.0.1 backupnas1
logstash-2015.02.07 4 r  STARTED  0115b 127.0.0.1 escorp
logstash-2015.02.07 4 p  STARTED  0115b 127.0.0.1 backupnas1
logstash2 r  STARTED  4  45.8kb 127.0.0.1 escorp
logstash2 p  STARTED  4  45.8kb 127.0.0.1 backupnas1
logstash2 r  STARTED  4  45.8kb 127.0.1.1 tetrad
logstash0 p  STARTED  1  10.9kb 127.0.0.1 escorp
logstash0 r  STARTED  1  10.9kb 127.0.0.1 backupnas1
logstash0 r  STARTED  1  10.9kb 127.0.1.1 tetrad
logstash3 p  STARTED  0115b 127.0.0.1 escorp
logstash3 r  STARTED  0115b 127.0.0.1 backupnas1
logstash3 r  STARTED  0115b 127.0.1.1 tetrad
logstash1 r  STARTED 10  66.5kb 127.0.0.1 escorp
logstash1 p  STARTED 10  66.5kb 127.0.0.1 backupnas1
logstash1 r  STARTED 10  66.5kb 127.0.1.1 tetrad
logstash4 p  STARTED  2  25.3kb 127.0.0.1 escorp
logstash4 r  STARTED  2  25.3kb 127.0.0.1 backupnas1
logstash4 r  STARTED  2  25.3kb 127.0.1.1 tetrad
logstash-2015.02.15 4 p  STARTED4207611 907.5mb 127.0.1.1 tetrad
logstash-2015.02.15 0 p  STARTED4208955 908.6mb 127.0.1.1 tetrad
logstash-2015.02.15 3 p  STARTED4209006 909.1mb 127.0.1.1 tetrad
logstash-2015.02.15 1 p  STARTED4213071 909.5mb 127.0.1.1 tetrad
logstash-2015.02.15 2 p  STARTED4210380 909.1mb 127.0.1.1 tetrad
logstash-2015.02.18 4 p  STARTED3875654   1.5gb 127.0.0.1 escorp
logstash-2015.0

Re: Elasticsearch AsyncAppender configuration

2015-02-18 Thread joergpra...@gmail.com
If you remove the log4j jar, the ES logging is initialized by a logging
facade called SLF4J, and this passes initialization to the underlying
logging framework, e.g. log4j2

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/logging/ESLoggerFactory.java#L40

Jörg

On Wed, Feb 18, 2015 at 4:28 PM, Mohamed Amine ABDESSEMED <
amine...@gmail.com> wrote:

> I can't see how removing the log4 jar and logging.yml will allow
> elasticsearch to init log4j2 using the xml conf file since the allowed
> suffixes are 'yml, yaml, json , properties'  (
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/logging/log4j/LogConfigurator.java#L48)
> ?
>
> Amine
>
> On Wednesday, February 18, 2015 at 9:12:15 AM UTC+1, Jörg Prante wrote:
>>
>> It's easy, just remove log4j jar and config/logging.yml, then add
>> log4j2.xml, log4j-api, log4j-core, log4j-slf4j-impl and slf4j-api jars to
>> the lib folder. Then you can set up async appenders with log4j2, which are
>> 18x more performant than log4j 1.x appenders.
>>
>> Jörg
>>
>> On Tue, Feb 17, 2015 at 5:42 PM, Mohamed Amine ABDESSEMED <
>> amin...@gmail.com> wrote:
>>
>>> We've set up elasticsearch logging as described here logging
>>> elasticsearch events with logstash
>>> 
>>>  and
>>> we want to use an AsyncAppender instead of using a SocketAppender directly
>>> in order to avoid freeze times while logging but only xml log4j conf file
>>> format supports AsyncAppender.
>>>
>>> Is there another way to achieve this since elasticsearch doesn't use xml
>>> conf files?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/095b157e-f2f4-4385-8205-dd9ff41f5f7e%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/032b5dc8-fd5b-4714-948d-7fe74a63ee65%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE0M%3DY%2BSYXpJPLBzUKu6GGvpNOj4iOUhPX-TU-1XHS7LA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread David Pilato
Have a look at 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart
 


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 18 févr. 2015 à 16:37, Jan-Erik Westlund  a écrit :
> 
> Thanks David !
> 
> All my "Recovery Throttling settings" are default in the elasticsearch.yml 
> file.
> How do I disable allocation, in a running production environment ? 
> Do I need to disable allocation first, restart each node / daemon, and after 
> rename the nodes ?
> 
> Or maybe it would be better to down the ES cluster (all 3 nodes) during a 
> maintenance windows, change all names, and then restart the ES cluster nodes 
> again ?
> 
> //Jan-Erik
> 
> Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
> Yes. It’s safe.
> You can do it one at a time.
> 
> If you already have data around and don’t want your shards moving during 
> this, you should disable allocation.
> 
> 
> -- 
> David Pilato | Technical Advocate | Elasticsearch.com 
> 
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
> 
> 
> 
>> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund > > a écrit :
>> 
>> Hi !
>> 
>> Is it safe to change the node names of my 3 nodes in an existing 
>> elasticsearch 1.4.0 cluster ?
>> 
>> The reason is to get rid of the random names like: Elizabeth "Betsy" 
>> Braddock, Franz Kafka, etc...
>> 
>> Is it just to set the node.name : "server name" in 
>> elasticsearch.yml and then restart the daemon ? 
>> Do I do it one node at the time, or do I need down the cluster and then 
>> change all node names, and then bring up the cluster again ?
>> 
>> 
>> //Jan-Erik
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com
>>  
>> .
>> For more options, visit https://groups.google.com/d/optout 
>> .
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/639A13FF-A6F9-4309-9D8A-B4C83184D6EE%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: [ES-1.4.0] Changing linux user ID

2015-02-18 Thread Yarden Bar
Issue was solved by modifying user/group IDs to mactch on all machines.

Thanks for the ideas :) 

On Wednesday, February 18, 2015 at 10:50:50 AM UTC+2, David Pilato wrote:
>
> Well if you are using a shared FS for snapshot/restore, you need to make 
> sure that each node user can write to this dir. 
> Try from all machines to write a file when running as es_01 and es_02
> May be you could add them to the same group and give write privileges to 
> the group?
>
> David
>
> Le 18 févr. 2015 à 09:09, Yarden Bar  > a écrit :
>
> Hi All,
>
> I encountered a 'Permission Denied' error while using ES Snapshot API to 
> add a 'FS' repository pointing a NFS shared folder.
>
> My cluster structure is:
>
>- NFS_MACHINE - user id is 600
>- ES_01 - user ID that runs ES is 1000
>- ES_02 - user ID that runs ES is 600
>
> The issue seems to be that the difference in user IDs between the machines 
> generated the permission issues.
>
> My question is about ES files across a machine file system, are there any 
> files outside ES installation folder (except for data/logs/configuration)?
>
> Thanks,
> Yarden
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/582c5f18-f40a-42cd-85bf-fcfeb8b18e8c%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01f0238b-cdca-46fc-bd38-bf12fcabeaeb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread Jan-Erik Westlund
Thanks David !

All my "Recovery Throttling settings" are default in the elasticsearch.yml 
file.
How do I disable allocation, in a running production environment ? 
Do I need to disable allocation first, restart each node / daemon, and 
after rename the nodes ?

Or maybe it would be better to down the ES cluster (all 3 nodes) during a 
maintenance windows, change all names, and then restart the ES cluster 
nodes again ?

//Jan-Erik

Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
>
> Yes. It’s safe.
> You can do it one at a time.
>
> If you already have data around and don’t want your shards moving during 
> this, you should disable allocation.
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>  
> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund  > a écrit :
>
> Hi !
>
> Is it safe to change the node names of my 3 nodes in an existing 
> elasticsearch 1.4.0 cluster ?
>
> The reason is to get rid of the random names like: Elizabeth "Betsy" 
> Braddock, Franz Kafka, etc...
>
> Is it just to set the node.name: "server name" in elasticsearch.yml and 
> then restart the daemon ? 
> Do I do it one node at the time, or do I need down the cluster and then 
> change all node names, and then bring up the cluster again ?
>
>
> //Jan-Erik
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ClassCastException when using Java API

2015-02-18 Thread Tim Molter
Hi,

I use the 'mapping' to get my datasources mapping. 

transportClient.admin().indices().getMappings(new GetMappingsRequest().
indices("myIndex")).get();


I then use what the mapping says to cast fields in returned documents. For 
one of the fields, the mapping says it's a long, but when I cast it I get:

! java.lang.ClassCastException: java.lang.Integer cannot be cast to java.
lang.Long

when using the `getSource()` method, which returns a  map.

long ts = (long) searchHit.getSource().get("ts");


Is this a bug in ES? It says it's a long, but the underlying Object in the 
map value is apparently an Integer!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec406cd1-2052-4b46-90ff-189267ff9832%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch AsyncAppender configuration

2015-02-18 Thread Mohamed Amine ABDESSEMED
I can't see how removing the log4 jar and logging.yml will allow 
elasticsearch to init log4j2 using the xml conf file since the allowed 
suffixes are 'yml, yaml, json , properties'  (
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/logging/log4j/LogConfigurator.java#L48)
 
?

Amine

On Wednesday, February 18, 2015 at 9:12:15 AM UTC+1, Jörg Prante wrote:
>
> It's easy, just remove log4j jar and config/logging.yml, then add 
> log4j2.xml, log4j-api, log4j-core, log4j-slf4j-impl and slf4j-api jars to 
> the lib folder. Then you can set up async appenders with log4j2, which are 
> 18x more performant than log4j 1.x appenders.
>
> Jörg
>
> On Tue, Feb 17, 2015 at 5:42 PM, Mohamed Amine ABDESSEMED <
> amin...@gmail.com > wrote:
>
>> We've set up elasticsearch logging as described here logging 
>> elasticsearch events with logstash 
>> 
>>  and 
>> we want to use an AsyncAppender instead of using a SocketAppender directly 
>> in order to avoid freeze times while logging but only xml log4j conf file 
>> format supports AsyncAppender.
>>
>> Is there another way to achieve this since elasticsearch doesn't use xml 
>> conf files?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/095b157e-f2f4-4385-8205-dd9ff41f5f7e%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/032b5dc8-fd5b-4714-948d-7fe74a63ee65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding timestamp property

2015-02-18 Thread Roy Zanbel
So...
Still no go ):
My steps are -
1. Created new index 'aql'
2. added mapping on my 'item'
now my mappings looks like this:
{
"aql": {
"mappings": {
"item": {
"_timestamp": {
"enabled": true,
"store": true
},
"properties": {}
}
}
}
}

Tried to read on other threads, and in stackoverflow and can see a lot of 
people asked the same question but no proposed result seemed to work for 
me...

If any one can provide a step by step on how to be to add to every entry a 
timestamp field by default I will guarantee his place in heaven!!

Thanks.

BR,
Roy. 




On Saturday, February 14, 2015 at 3:30:28 PM UTC+2, Roy Zanbel wrote:
>
> Hi,
>
> New to elasticsearch and have a simple question had a hard time finding 
> online.
> I wish to add a timestamp field.and later use it in kibana 
> This is how my settings/ mappings looks like:
> {
> "aql": {
> "mappings": {
> "item": {
> "_timestamp": {
> "enabled": true,
> "store": true
> },
> "properties": {}
> }
> },
> "settings": {
> "index": {
> "item": {
> "_timestamp": {
> "enabled": "true",
> "store": "true"
> }
> },
> "creation_date": "1423908699031",
> "number_of_shards": "5",
> "number_of_replicas": "1",
> "version": {
> "created": "1040299"
> },
> "uuid": "JqNaClL1Q5-ucG6NI1bvOA"
> }
> }
> }
> }
>
> and after posting new indices would like to see a timetamp option to 
> filter event in kibana.
>
> Thanks in advance.
>
> BR,
> Roy.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1aedcf2-8960-44ba-8e8d-768863a5a9c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread David Pilato
Yes. It’s safe.
You can do it one at a time.

If you already have data around and don’t want your shards moving during this, 
you should disable allocation.


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 18 févr. 2015 à 16:14, Jan-Erik Westlund  a écrit :
> 
> Hi !
> 
> Is it safe to change the node names of my 3 nodes in an existing 
> elasticsearch 1.4.0 cluster ?
> 
> The reason is to get rid of the random names like: Elizabeth "Betsy" 
> Braddock, Franz Kafka, etc...
> 
> Is it just to set the node.name: "server name" in elasticsearch.yml and then 
> restart the daemon ? 
> Do I do it one node at the time, or do I need down the cluster and then 
> change all node names, and then bring up the cluster again ?
> 
> 
> //Jan-Erik
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1916C068-085D-4F86-B877-FE4F9003B46D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Is it safe to change node names in an existing ElasticSearch cluster

2015-02-18 Thread Jan-Erik Westlund
Hi !

Is it safe to change the node names of my 3 nodes in an existing 
elasticsearch 1.4.0 cluster ?

The reason is to get rid of the random names like: Elizabeth "Betsy" 
Braddock, Franz Kafka, etc...

Is it just to set the node.name: "server name" in elasticsearch.yml and 
then restart the daemon ? 
Do I do it one node at the time, or do I need down the cluster and then 
change all node names, and then bring up the cluster again ?


//Jan-Erik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Extending the original _source data

2015-02-18 Thread Leandro Camargo
Hello.

I want to know if it's possible put a bit more data inside the _source 
attribute, compared to the original ones that are already included from the 
indexed fields.
If it is, how could I do it?

Thanks in advance.
-- Leandro

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49fb2c0c-b300-4511-b55f-4d2c858231ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


problem with using uax_url_email

2015-02-18 Thread Marria
Hi everybody,

I want to perform URL extraction from my PDF files. I use mapper-attachment 
plugin to index my PDF files.

In order to be able to perform some regex queries and extract all the urls 
present in a pdf file, I useduax_url_email:

curl -X PUT "localhost:9200/test" -d '{

>   "settings" : {

> "index": {

>   "analysis" :{

> "analyzer": {

>   "default": {

> "type" : "custom",

> "tokenizer" : "uax_url_email",

> "filter" : ["standard", "lowercase", "stop"]

>   }

> }

>   }

> }

>   }

> }'



 and the map :

curl -X PUT "localhost:9200/test/attachment/_mapping" -d '{

>   "attachment" : {

> "properties" : {

>   "file" : {

> "type" : "attachment",

> "fields" : {

>   "title" : { "store" : "yes" },

>   "file" : { "term_vector":"with_positions_offsets", 
"store":"yes" }

> }

>   }

> }

>   }


I indexed some PDF files, the problem is for a file , I get this (while 
urls in this file start with http://):



for another file, I got this (it leaves the http:// ):


 But the problem is the urls are not recognized completely , look at this:



Is it caused by the double column representation in the PDF file?



So, what did I do wrong? how can I fix this and use regexp queries 
successfully to extract all the URLs?


Thank you





 
 

 

 




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28cf5243-c60b-4cbc-b488-2da97c65061d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


I can't get my lowercase filter to work :/

2015-02-18 Thread Oscar Cpl
Hi,


I need to have my @message field not analyzed and also to be lowercased 
when stored into ES, the reason is that Kibana can't do proper wildcard 
search because of lowercase_expanded_terms. I tried that but it seems that 
@message is still not lowercase:

{
>
>- logstash: 
>{
>   - order: 0,
>   - template: "logstash-*",
>   - settings: 
>   {
>  - index.analysis.analyzer.lowercaseonly.type: "custom",
>  - index.query.default_field: "@message",
>  - index.refresh_interval: "3s",
>  - index.analysis.analyzer.lowercaseonly.filter.0: "lowercase",
>  - index.analysis.analyzer.lowercaseonly.tokenizer: "keyword"
>  },
>   - mappings: 
>   {
>  - _default_: 
>  {
> - dynamic_templates: 
> [
>- 
>{
>   - string_fields: 
>   {
>  - mapping: 
>  {
> - index: "analyzed",
> - omit_norms: true,
> - type: "string",
> - fields: 
> {
>- raw: 
>{
>   - index: "not_analyzed",
>   - ignore_above: 256,
>   - type: "string"
>   }
>}
> },
>  - match_mapping_type: "string",
>  - match: "*"
>  }
>   }
>],
> - properties: 
> {
>- host: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- received_at: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- @message: 
>{
>   - index: "not_analyzed",
>   - analyzer: "lowercaseonly",
>   - type: "string"
>   },
>- file: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- thread: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- path: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- class: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- logger_name: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- method: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- gameKey: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- @version: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   },
>- hostname__: 
>{
>   - index: "not_analyzed",
>   - type: "string"
>   }
>},
> - _all: 
> {
>- enabled: false
>}
> }
>  },
>   - aliases: { }
>   }
>
> }
>


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ddc487cb-94f8-449e-af45-e09f12ed7e30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Installing shield

2015-02-18 Thread Glen Avery
Thanks! I was using ES 1.4.0, and when I upgraded it worked fine.

Cheers
Glen

On 18 February 2015 at 13:47, Jay Modi  wrote:

> Are you using ES 1.4.3? Is the plugins directory on the same
> filesystem/partition as the rest of elasticsearch?
>
> On Tuesday, February 17, 2015 at 9:53:17 AM UTC-5, Glen Avery wrote:
>>
>> I have tried to install shield for elasticsearch on fedora 21
>>
>> However I get the following error
>>
>> [root@localhost elasticsearch]# bin/plugin -i
>> elasticsearch/license/latest
>> -> Installing elasticsearch/license/latest...
>> Trying http://download.elasticsearch.org/elasticsearch/license/
>> license-latest.zip...
>> Downloading DONE
>> Installed elasticsearch/license/latest into /usr/share/elasticsearch/
>> plugins/license
>>
>> [root@localhost elasticsearch]# bin/plugin -i elasticsearch/shield/latest
>> -> Installing elasticsearch/shield/latest...
>> Trying http://download.elasticsearch.org/elasticsearch/shield/
>> shield-latest.zip...
>> Downloading 
>> ..DONE
>> Installed elasticsearch/shield/latest into /usr/share/elasticsearch/
>> plugins/shield
>> Failed to install elasticsearch/shield/latest, reason: Could not move
>> [/usr/share/elasticsearch/plugins/shield/config] to
>> [/usr/share/elasticsearch/plugins/shield/config]
>>
>> Has anyone else tried installing shield and seen this error?
>>
>>
>> Cheers
>> Glen
>>
>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/ZB9AHpfyV9w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ce31ad48-271f-4b85-86d6-f4173850c41d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 

mvh
Glen
070 8715166

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEr9Q_ds-LS6NVdowSXxnmanGfpXn9XJHCgfdOh%3DW38BsvtW2w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic Keywords extraction in ElasticSearch

2015-02-18 Thread Marria
Ok charlie, I understand. 

Thanks a lot.

Le lundi 16 février 2015 14:00:38 UTC+1, Marria a écrit :
>
> Hi all,
>
> I started using ElasticSearch to index my corpus of PDF files, I succeeded 
> in indexing my PDF files as attachments (base64), my search queries on the 
> content go right but I couldn't find how to extract automaticaly keywords 
> from these files in ElasticSearch. Is it possible to do that with 
> ElasticSearch or not?
>
> Could anybody help with relevent links or advices??
>
> Thanks a lot.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7218672d-8e7b-4f6a-bec5-ba32a33a2993%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elastic search on t2.micro (Amazon WS)

2015-02-18 Thread Seung Chan Lim
I'm trying to see if I can get elastic search (ES) 1.3.8 working with 
couchbase (CB) 3.0.2 on a t2.micro (Amazon WS)

t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test 
development on this with not a lot of documents (<1000).

I just installed ES and followed the CB instructions to install the plugin 
and set the XDCR to get the replication going from CB to ES.

I also configured ES to have 0 replication and 1 shard (hoping this would 
help minimize RAM usage). But I'm still seeing behaviors from ES where it 
locks up the server, making it unresponsive than eventually complaining of 
lack of memory.

Is there something else I can do to get this working on a t2.micro? 

I'm a complete newbie to ES, and any help would be great,

thank you

slim

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Installing shield

2015-02-18 Thread Jay Modi
Are you using ES 1.4.3? Is the plugins directory on the same 
filesystem/partition as the rest of elasticsearch?

On Tuesday, February 17, 2015 at 9:53:17 AM UTC-5, Glen Avery wrote:
>
> I have tried to install shield for elasticsearch on fedora 21
>
> However I get the following error
>
> [root@localhost elasticsearch]# bin/plugin -i elasticsearch/license/latest
> -> Installing elasticsearch/license/latest...
> Trying 
> http://download.elasticsearch.org/elasticsearch/license/license-latest.zip.
> ..
> Downloading DONE
> Installed elasticsearch/license/latest into 
> /usr/share/elasticsearch/plugins/license
>
> [root@localhost elasticsearch]# bin/plugin -i elasticsearch/shield/latest
> -> Installing elasticsearch/shield/latest...
> Trying 
> http://download.elasticsearch.org/elasticsearch/shield/shield-latest.zip.
> ..
> Downloading 
> ..DONE
> Installed elasticsearch/shield/latest into 
> /usr/share/elasticsearch/plugins/shield
> Failed to install elasticsearch/shield/latest, reason: Could not move 
> [/usr/share/elasticsearch/plugins/shield/config] to 
> [/usr/share/elasticsearch/plugins/shield/config]
>
> Has anyone else tried installing shield and seen this error?
>
>
> Cheers
> Glen
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce31ad48-271f-4b85-86d6-f4173850c41d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


general question shard initialization in cluster

2015-02-18 Thread Marc
I have a cluster with 2 data nodes. Each index has 5 primary shards and 1 
replica. Therefore, the data on both nodes should be identical, at least 
for indices that are not changed anymore.
Before I restart/shut a node I disable routing allocation and once the node 
is up again I reactivate it.
In most cases the intialization of a shard is really quick as the data is 
already there, but sometimes I seems that the whole data is recreated even 
though nothing was changed about this index in the meantime.
Is that normal? And how to assure that the existing replica/primary shard 
on the rebooted node is reused?
Did I miss something fundamental?

Cheers
Marc

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a34c335e-9768-4ec7-bff5-dd75ac485249%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Range Aggregation - Default 'key' [ES 1.1.1]

2015-02-18 Thread Colin Goodheart-Smithe
How many nodes do you have in your cluster? 

If you have 2 nodes then it would suggest that there is something different 
between the two nodes which is causing this and your request is alternating 
between each of the nodes. 

My first guess would be that the nodes are accidentally on different 
versions. 

I would start by using the nodes API to confirm the nodes are all on the 
same version of Elasticsearch:

curl -XGET "http://localhost:9200/_nodes/settings";


Hope this helps

Colin 

On Tuesday, 17 February 2015 17:39:16 UTC, Scott Rallya wrote:
>
> In running the following range aggregation [choosing not to specify a 
> 'key' for each range]
>
> {
>   "query": {
> "match_all": {}
>   },
>   "aggs": {
> "duration": {
>   "range": {
> "field": "duration",
> "ranges": [
>   { "to": 60 },
>   { "from": 60, "to": 300},
>   { "from": 300, "to": 900},
>   { "from": 900, "to": 3600},
>   { "from": 3600 }
>  ]
>   }
> }
>   }
> }
>
> I seem to be alternating between two result sets, the first request will 
> return:
> "aggregations": {
> "duration": {
> "buckets": [
> {
> "to": 60,
> "doc_count": 157680
> },
> {
> "from": 60,
> "to": 300,
> "doc_count": 181398
> },
> {
> "from": 300,
> "to": 900,
> "doc_count": 39937
> },
> {
> "from": 900,
> "to": 3600,
> "doc_count": 8809
> },
> {
> "from": 3600,
> "doc_count": 298
> }
> ]
> }
> }
>
> And the subsequent request will return:
> "aggregations": {
> "duration": {
> "buckets": [
> {
> "key": "*-60.0",
> "to": 60,
> "doc_count": 157680
> },
> {
> "key": "60.0-300.0",
> "from": 60,
> "to": 300,
> "doc_count": 181398
> },
> {
> "key": "300.0-900.0",
> "from": 300,
> "to": 900,
> "doc_count": 39937
> },
> {
> "key": "900.0-3600.0",
> "from": 900,
> "to": 3600,
> "doc_count": 8809
> },
> {
> "key": "3600.0-*",
> "from": 3600,
> "doc_count": 298
> }
> ]
> }
> }
>
> Each request afterwards alternates between "key" being absent from each 
> bucket in the list and then being present. Was hoping someone might have 
> some insight as to what is going on just to satisfy my own curiosity.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65724db4-f1a0-4015-8a3d-924c7b1f73a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using q=* or match_all to search an index return partial records

2015-02-18 Thread JOSE R VELEZ
Just what I was looking for. Thanks both for the help. 


On Tuesday, February 17, 2015 at 12:19:45 PM UTC-8, sri krishna wrote:

> Default it is 10
>
> Please check  
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
>  
> for modifying the same.
> Also you can try with scan and scroll, 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
>  
> to avoid OOM issues!
>
> On Wednesday, 18 February 2015 01:32:50 UTC+5:30, JOSE R VELEZ wrote:
>>
>> Hi, 
>>
>> I'm new to ElasticSearch and I'm following the Getting Started tutorial. 
>> Right now I'm looking at "the search api" section (link: 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_the_search_api.html).
>>  
>>
>>
>> I have successfully added the records from accounts.json sample data to 
>> an ElasticSearch index (as taught in the section before "the search api"), 
>> but when I search for it using 
>>
>> curl 'localhost:9200/bank/_search?q=*&pretty'
>>
>>
>> ... the response states that it found a total of 1000 hits, but only 
>> shows 10 of them. 
>>
>> Is this regular behavior? If so, is there a way to override it so it 
>> returns all the hits instead of just 10?
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33ff5453-e677-49fa-9c69-e5e74930b980%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [QUESTIONS] - failover mechanisms and consistency

2015-02-18 Thread Nicolas Harraudeau
The reason I am concerned is this sentence in an ongoing issue description 
found 
at http://www.elasticsearch.org/guide/en/elasticsearch/resiliency/current/ :
"If a network partition separates a node from the master, there is some 
window of time before the node detects it. This window is extremely small 
if a socket is broken. More adversarial partitions, for example, silently 
dropping requests without breaking the socket can take longer (up to 3x30s 
using current defaults)"

The client I use is elasticsearch.js. Its documentation says that it will 
round-robin the requests on its connections and I would like to know if it 
can successfully send the GET request to an unallocated node.
If this can happen, what are the recommendations to prevent this situation? 
sending request only to non-data nodes?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4554281c-622e-48e2-b2f5-aed54c2b4670%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: One alias with multiple indices

2015-02-18 Thread Colin Goodheart-Smithe
Hi,

You can't use aliases which point to multiple indexes for indexing 
operations. This is because there is no way of elasticsearch knowing which 
index in your alias you want to apply the indexing operation to.

What you can do instead is to create another alias called something like 
'data-write' and point it at the index you want to use to write data to. 
This way your application can always point to the alias (data-write) to 
write data and the alias (data-active) to read data. Then you can change 
the index in 'data-write' when the index you are wanting to write to 
changes (making sure you do the add an remove in a single request, example 
in [1]) and you can change the indexes in 'data-active' when the indexes 
you want to read/search from change.

Hope this makes sense

Colin

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.html

On Wednesday, 18 February 2015 09:43:53 UTC, tao hiko wrote:
>
> Hello,
>
> I created a alias that contain multiple indices and I can get data without 
> any error but insert/update data via alias that show below message
>
> {
>"error": "ElasticsearchIllegalArgumentException[Alias [data-active] has 
> more than one indices associated with it [[data-v4, data-v2, data-v3]], 
> can't execute a single index op]",
>"status": 400
> }
>
> Actually, inserting and updating are working on data-v4 only but I need to 
> do via alias because I can reuse application code without any change.
>
> Is it possible to configure something to do like that?
>
>
> Thank you,
> Hiko
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b736701-4ce6-49aa-b44d-1cd88418dcdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[dev] building elasticsearch - maven issues

2015-02-18 Thread David Pilato
Dear contributors (if you are elasticsearch user, you can skip this email),


A commit was push yesterday which changes the names of the maven repositories 
in the elasticsearch pom file. When you sync to the latest on the master branch 
you may find that your maven build errors with the following:

[ERROR] Failed to execute goal on project elasticsearch: Could not resolve 
dependencies for project org.elasticsearch:elasticsearch:jar:2.0.0-SNAPSHOT: 
Failed to collect dependencies at 
org.apache.lucene:lucene-test-framework:jar:5.1.0-snapshot-1657571: Failed to 
read artifact descriptor for 
org.apache.lucene:lucene-test-framework:jar:5.1.0-snapshot-1657571: Could not 
transfer artifact 
org.apache.lucene:lucene-test-framework:pom:5.1.0-snapshot-1657571 from/to 
lucene-snapshots (https://download.elasticsearch.org/lucenesnapshots/1657571 
): Access denied 
to: 
https://download.elasticsearch.org/lucenesnapshots/1657571/org/apache/lucene/lucene-test-framework/5..
 
.,
 ReasonPhrase: Forbidden. -> [Help 1]

This is caused by this commit 

 which basically tries to solve a Maven issue (MNG-5185 
) you can hit if you try to build 
also plugins.

If you have trouble to compile elasticsearch or a plugin using `mvn compile` 
and hit a `Access denied to: [URL_HERE], ReasonPhrase: Forbidden. -> [Help 1]`, 
you can remove related maven files:

find ~/.m2/repository -name _remote.repositories -exec rm -v {} \;
find ~/.m2/repository -name _maven.repositories -exec rm -v {} \;

Another option is to tell Maven not using those files with --llr:

mvn compile --llr
Note that the later is only a temporary fix.


I hope this will help.
-- 

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/C55FCCE3-2291-4684-9CED-F86F7A885D72%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Help with 4 node cluster

2015-02-18 Thread sysads
Hi

I am in need of help on setting up a 4 node elasticsearch servers. I have 
installed and configured ES on all 4 nodes but I am lost as to what the 
configuration in elasticsearch.yml will be:

- if I want to have all 4 nodes both master and data
- make node A act as primary shard while node B acts as its replica then 
node C as primary shard while node D as its own replica.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ff3ad49c-f305-44cb-a473-523ad6758835%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Operators NEARx, BEFOR, AFTER, FIRSTx, LASTx

2015-02-18 Thread Petr Janský
Hi Lukas,

thank you for your answer. I checked the "Proximity Match" - "match_phrase" 
and it's what I looking for. I'm only not able to find a way how to create 
queries like:

   1. Obama BEFORE Iraq - the first word(not term) is before the second in 
   a field text
   2. "President Obama" AFTER Iraq - the phrase "President Obama" is after 
   Iraq in a field text
   
In other words, the match_phrase doesn't have in_order parameter like 
span_near and for span_near I have to use terms - have to run analyzer for 
words befor.

Do you have any idea how to implement these queries?

Thanks
Petr

Dne pondělí 19. ledna 2015 10:23:21 UTC+1 Lukáš Vlček napsal(a):
>
> Hi Petr,
>
> let me try to address some of your questions:
>
> ad 1) I am not sure I understand what you mean. If you want to use span 
> type of query then simply use it instead of query string query. Especially, 
> if you pass user input into the query then it is recommended NOT to use 
> query string query and you should consider using different query type (like 
> span query in your case).
>
> ad 2) Not sure I fully understand but I can see match for some of those 
> requested features in span queries. Like "slop". I would recommend you to 
> read through chapters of "Proximity Matching" [1] to see how you can use 
> "slop".
>
> ad 3) The input that goes into span queries can go through text analysis 
> process (as long as I am not mistaken). The fact that there are term 
> queries behind the scene does not mean you can not process your analysis 
> first.
>
> May be if you can share some of your configs/documents/queries we can help 
> you more.
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/proximity-matching.html
>
> Regards,
> Lukas
>
> On Mon, Jan 19, 2015 at 10:02 AM, Petr Janský  > wrote:
>
>> Noone? :-(
>>
>> Petr
>>
>> Dne úterý 13. ledna 2015 15:37:18 UTC+1 Petr Janský napsal(a):
>>>
>>> Hi there,
>>>
>>> I'm looking for a way how to access span_near and span_first 
>>> functionality to users via search box in gui that uses query string query.
>>>
>>>1. Is there any easy way how to do it?
>>>2. Will ElasticSeach folks implement operators like NEARx, BEFOR, 
>>>AFTER, FIRSTx, LASTx to be able search by (using query string):
>>>   - specific max word distance between key words
>>>   - order of key words
>>>   - word position of key word in field from start and end of field 
>>>   text
>>>3. Span queries enable to use only terms, is there a way how to use 
>>>words that will be analysed by lang. analyser - stemming etc.?
>>>
>>>
>>> Thanks
>>> Petr
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f90a0eba-1b61-4a23-a2af-ec6a0c5e461f%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e88fb60-0e1c-423e-8208-a5e01206c620%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Strange behavior of ElasticSearch

2015-02-18 Thread Alexandre

Hi Mark,

I currently have five indexes with 5 shards by index :

1 : 7.24MB
2 : 257MB
3 : 623kB
4 : 30.2MB
5 : 629MB

I think it's nothing for ElasticSearch.

I activated the slow log and found that the http bing robots are 
researching for 50 hours !


The problem is fixed now. Stats search is now 3.5ms.

Thank you Mark.

Good day.

Alex.


On 17/02/15 23:15, Mark Walkom wrote:

How many indices in the cluster, how many shards, how much data is that
in GB?

On 18 February 2015 at 01:17, Alexandre mailto:in...@opendoc.net>> wrote:

Hello everyone,

for 2 days, I have a strange behavior with ElasticSearch. This is
the context:

- 1 clsuter with 3 node
- os : debian
- version 1.4.0 (official package)
- OS mem : 12 Go
- JMX : 8 Go
- CPU : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
- Nb CPU : 9

 From 7 am Monday, I was an increase in the use of RAM. Recycling
the JMX is from once per day to once every 2 hours. I have no
explanation. The CPU is overused. The response time is 10 ms instead
of 3,5mns. Restarting each node has not changed anything.

Do you have a method or tools to diagnose ElasticSearch behavior ?
Should I upgrade to version 1.4.3 ?

Thank you.

Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@__googlegroups.com
.
To view this discussion on the web visit

https://groups.google.com/d/__msgid/elasticsearch/54E34D69.__6070800%40opendoc.net

.
For more options, visit https://groups.google.com/d/__optout
.


--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X959_QgNPKb_wqG8M7Hbb-N8T%2BYJYfNyTp8ofGAmOhcYA%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54E474BD.7040404%40opendoc.net.
For more options, visit https://groups.google.com/d/optout.


elastic search cluster behind azure public load balncer

2015-02-18 Thread Subodh Patil



i am trying to setup ES cluster behind azure load balancer /cloud service. 
the cluster is 3 node with no specific data/client/master node settings. By 
default 2 nodes are elected as master and 1 as data node.

As the request (create/update/search) from application comes to azure load 
balancer on 9200 port which load balanced for all 3 vms the request can go 
to any vm.

Will master node be able to serve the requests ?

Many article says that you don't need load balancer for ES cluster just use 
client node but then it becomes single point of failure as azure vm can go 
down any point of time. so load balancing is required mainly for high 
availability from infrastructure point of view.
please suggest cluster setup and which nodes (data or client) to be put 
behind load balancer.


ES version 1.4.1 on windows server 2012 r2 vm
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES Date Histogram Bucket max item count

2015-02-18 Thread neil . varnas


I have timeSeries based data.

Im performing a date histogram aggregation with interval '1h' and sorting 
the results by "_count". Is it possible to limit the item count in the 
bucket so it would only return the 1st one, the item with the maximum count 
and not bother about the other ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/29d5a9b7-32a5-4c0b-9d69-84b30d68%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic Keywords extraction in ElasticSearch

2015-02-18 Thread Charlie Hull

On 17/02/2015 10:21, Marria wrote:

Hi Charlie,

I am really grateful to you.

Well, my supervisor is not available this week to ask him why he wants
this. But I think he wants to classify his scientific documents by topics.

He wants all this (extraction of keywords, themes/topics ,
classification of documents...etc) integrated in ElasticSearch directly.
So, after indexation in ElasticSearch, he wants to be able to extract
them (for one document or a group of documents). I am not sure if it is
feasible to implement all this on ElasticSearch, I am not even sure if
the ElasticSearch is the best tool for these purposes.


You can *index* this extra information (metadata) in Elasticsearch, and 
use it to search/filter/facet to extract the documents accordingly. 
However *creating* this extra information is something you would do as 
part of the document ingestion process, before indexing with Elasticsearch.


Classification is another thing entirely - there are lots of ways to do 
this - manual, naive Bayes, using stored expressionsbut again this 
is adding metadata to a document.


I think you have lots of things to do here, and your first step would be 
to understand the definition of each and what is possible and with which 
tool.


Cheers

Charlie



What do you think?

Cheers :)



--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/31459a21-97c8-48e2-b6c1-c0b15e1269e1%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54E4611C.7070401%40flax.co.uk.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch performance tuning

2015-02-18 Thread Deva Raj
Hi All,

 In a Single Node Elastic Search along with logstash, We tested with 20mb 
and 200mb file parsing to Elastic Search on Different types of the AWS 
instance i.e Medium, Large and Xlarge.

Environment Details : Medium instance 3.75 RAM  1 cores Storage :4 GB SSD 
64-bit Network Performance: Moderate 
Instance running with : Logstash, Elastic search

Scenario: 1

**With default settings** 
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175


Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 5
indices.memory.index_buffer_size: 50%

# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

**With added settings** 
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180

Scenario 2

Environment Details : R3 Large 15.25 RAM  2 cores Storage :32 GB SSD 
64-bit Network Performance: Moderate 
Instance running with : Logstash, Elastic search

**With default settings** 
Result :
  20mb logfile 7 mins Events Per/second 750
  200mb logfile 65 mins Events Per/second 800

Added the following to settings:
Java heap size: 7gb
other parameters same as above

**With added settings** 
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800

Scenario 3

Environment Details : 
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD 
64-bit Network Performance: Moderate 
Instance running with : Logstash, Elastic search

**With default settings** 
  Result:
  20mb logfile 7 mins Events Per/second 1200
  200mb logfile 34 mins Events Per/second 1200

 Added the following to settings:
Java heap size: 15gb
other parameters same as above

**With added settings** 
Result:
20mb logfile 7 mins Events Per/second 1200
200mb logfile 34 mins Events Per/second 1200

I wanted to know

1. What is the benchmark for the performance?
2. Is the performance meets the benchmark or is it below the benchmark
3. Why even after i increased the elasticsearch JVM iam not able to find 
the difference?
4. how do i monitor Logstash and improve its performance?

appreciate any help on this as iam new to logstash and elastic search. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


One alias with multiple indices

2015-02-18 Thread tao hiko
Hello,

I created a alias that contain multiple indices and I can get data without 
any error but insert/update data via alias that show below message

{
   "error": "ElasticsearchIllegalArgumentException[Alias [data-active] has 
more than one indices associated with it [[data-v4, data-v2, data-v3]], 
can't execute a single index op]",
   "status": 400
}

Actually, inserting and updating are working on data-v4 only but I need to 
do via alias because I can reuse application code without any change.

Is it possible to configure something to do like that?


Thank you,
Hiko

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6436e1d-a6d1-46c9-8303-e10bb883a8ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch aggregations for analytics

2015-02-18 Thread Itai Yaffe
Hey,
We have an Elasticsearch cluster in production with 20 nodes which hold a 
few TBs of data and loading millions of documents a day.
We use Elasticsearch for analytics purposes and the main thing we're 
interested is counting unique users.

We started the production search with Elasticsearch 0.9.X, when there was 
no cardinality aggregation, therefor we were bound to create the document 
structure as seen below.
Most of our queries are looking for the unique users count based on a date 
range and specific segments.
Some of our analytic UI screens require executing hundreds of queries in 
parallel and one even requires thousands of queries.

When migrating to V1.4, we hoped to start using the aggregation feature, 
but even with the doc_values enabled, we experience aggregation time of 
*minutes*...
We're running on c3.8xlarge EC2 instances with 60GB RAM, of which 30GB are 
allocated to ES heap.
We have 6 indexes with 2 replicas each, each index has 20 shards.
Each aggregation/query is performed against a single index (see aggregation 
example below).

Has anyone dealt with such use cases? 

Thanks!

*Document structure* :
{
 "user": {
"_ttl": {
   "enabled": true
},
"properties": {
   "events": {
  "type": "nested",
  "properties": {
 "event_time": {
"type": "date",
"format": "dateOptionalTime",
"doc_values" : true
 },
 "segments": {
"properties": {
   "segment": {
  "type": "string",
  "index": "not_analyzed",
  "doc_values" : true
   }
}
 }
  }
   }
}
 }
}

For example :
{
  "_index": "...",
  "_type": "...",
  "_id": "...",
  "_version": 1,
  "_score": 1,
  "_source": {
"events": [
  {
"event_time": "2014-11-03",
"segments": [
  {
"segment": "ALICE"
  },
  {
"segment": "BOB"
  }
]
  },
  {
"event_time": "2014-11-04",
"segments": [
  {
"segment": "RON"
  },
  {
"segment": "YULA"
  }
]
  }
]
  }
}


*Aggegation example* :
{
"size": 0,
"query": {
"nested": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"events.event_time": {
"from": "2014-11-17",
"to": "2014-11-24",
"include_lower": true,
"include_upper": true
}
}
}
]
}
}
}
},
"path": "events"
}
},
"aggregations": {
"nested": {
"nested": {
"path": "events"
},
"aggregations": {
"segments": {
"terms": {
"field": "events.segments.segment",
"size": 0
},
"aggregations": {
"uu": {
"reverse_nested": {}
}
}
}
}
}
}
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/acbc3022-8845-4170-999d-d0b2bc9dfeb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Percolate - limit filtered percolating queries to the best matches

2015-02-18 Thread Arnaud LARROQUE
Hi, 

Let's say I have this kind of percolate query :

GET my-index/my-type/_percolate
{
  "doc": {
   ...
  },
  "query": {
"match": {
 ...
}
  },
  "track_scores": true,
  "sort": "_score"
}

I have a query option which limit the number of queries to percolate my doc 
against. 
In fact I'm only interested in the best result (_score) of the matching 
percolate queries.
Is there anyway to limit the number of the filtered percolate queries to 
the best match ?

Thanks in advance for your help

Regards,

Arnaud 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54fe3ae9-be45-4afd-a98a-80657aa17dd0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ngram not working for multivalued field

2015-02-18 Thread sri krishna
We have a live query terms the query will be for eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to avoid
terms in the query having document frequency > 10 within the specific
field. i.,e if title:west has df as 11 and desc:world has df 20, elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our search
queries are very high in number!

On Wed, Feb 18, 2015 at 1:39 AM, sri krishna  wrote:

> Hi,
>
>
>Couple of questions:
>
> 1) We are trying to create an index having an analyzed multi value field
> (filter used is n-gram). But we are not able to query the partial values.
> But when we have single valued field for same filter everything is working
> as expected, i.e able to retrieve partial query results as well.
>
> Create index:
> curl -X PUT "http://localhost:9200/-test"; -d '{
>   "mappings" : {
> "test" : {
>   "properties" : {
>   "lists" : {
>   "properties" : {
> "url_domain" : {
>   "type" : "string",
>   "search_analyzer" : "str_search_analyzer",
>   "index_analyzer" : "str_index_analyzer"
>}
>  }
> }
>   }
> }
>   },
>
>   "settings" : {
> "analysis" : {
>   "analyzer" : {
>
>   "str_search_analyzer" : {
>   "tokenizer" : "keyword",
>   "filter" : ["lowercase"]
> },
> "str_index_analyzer" : {
>   "tokenizer" : "keyword",
>   "filter" : ["lowercase", "substring"]
> }
>   },
>
>   "filter" : {
> "substring" : {
>   "type" : "nGram",
>   "min_gram" : 2,
>   "max_gram"  : 5
> }
>   }
> }
>   }
> }’;
>
>
> Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test";
> -d '{ "url_domain" : "slkd" }' curl -X POST "
> http://localhost:9200/xxx-test/test"; -d '{ "url_domain" :
> ["a1b2c","c1de"] }’
>
> Search query used and got some results as expected(this is entire string
> match) curl "http://localhost:9200/xxx-test/_search"; -d '{ "query": {
> "match": {"url_domain": “a1b2c"} } }’
>
> Search query used but didn’t give any results(this is a partial match)
> curl "http://localhost:9200/xxx-test/_search"; -d '{ "query": { "match":
> {"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
> expecting a result for this query. Let us know if our understanding is
> wrong?
>
> 2) We have a query with collection of dynamic terms eg: title:test AND
> title:west AND desc:world AND desc:hello, now our objective is to avoid
> terms in the query having document frequency > 10 within the specific
> field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
> search should be internally changing the query to title:west AND
> desc:hello, let us know if this can be done in effective way, as our search
> queries are very high!
>
> 3) We are using ngram for prefix,suffix and fuzzy queries are there any
> effective ways to store the index for the same?
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/gTXGdXAXi_Y/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/034fca16-9fb0-4830-8fec-9184a42ba866%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHZb4M6XV5NS5-SfL39xsbRWjffmAETbM1S%3D370DjuAdxr1uwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ES-1.4.0] Changing linux user ID

2015-02-18 Thread David Pilato
Well if you are using a shared FS for snapshot/restore, you need to make sure 
that each node user can write to this dir. 
Try from all machines to write a file when running as es_01 and es_02
May be you could add them to the same group and give write privileges to the 
group?

David

> Le 18 févr. 2015 à 09:09, Yarden Bar  a 
> écrit :
> 
> Hi All,
> 
> I encountered a 'Permission Denied' error while using ES Snapshot API to add 
> a 'FS' repository pointing a NFS shared folder.
> 
> My cluster structure is:
> NFS_MACHINE - user id is 600
> ES_01 - user ID that runs ES is 1000
> ES_02 - user ID that runs ES is 600
> The issue seems to be that the difference in user IDs between the machines 
> generated the permission issues.
> 
> My question is about ES files across a machine file system, are there any 
> files outside ES installation folder (except for data/logs/configuration)?
> 
> Thanks,
> Yarden
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/582c5f18-f40a-42cd-85bf-fcfeb8b18e8c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1B1A87D5-9D7D-4016-A0D8-1350E52A2906%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: [Hadoop][Spark] Exclude metadata fields from _source

2015-02-18 Thread Itai Yaffe
Hey,
Has anyone experienced with such an issue?
Perhaps Costin can help here?

Thanks!

On Thursday, February 12, 2015 at 8:27:14 AM UTC+2, Itai Yaffe wrote:
>
> Hey,
> I've recently started using Elasticsearch for Spark (Scala application).
> I've added elasticsearch-spark_2.10 version 2.1.0.BUILD-SNAPSHOT to my 
> Spark application pom file, and used 
> org.apache.spark.rdd.RDD[String].saveJsonToEs() to send documents to 
> Elasticsearch.
> When the documents are loaded to Elasticsearch, my metadata fields (e.g 
> id, index, etc.) are being loaded as part of the _source field.
> Is there a way to exclude them from the _source?
> I've tried using the new "es.mapping.exclude" configuration property 
> (added in this commit 
> 
>  
> - that's why I needed to take the latest build rather than using version 
> 2.1.0.Beta3), but it doesn't seem to have any affect (although I'm not sure 
> it's even possible to exclude fields I'm using for mapping, e.g "
> es.mapping.id").
>
> A code snippet (I'm using a single-node Elasticsearch cluster for testing 
> purposes and running the Spark app from my desktop) :
> val conf = new SparkConf()...
> conf.set("es.index.auto.create", "false")
> conf.set("es.nodes.discovery", "false")
> conf.set("es.nodes", "XXX:9200")
> conf.set("es.update.script", "XXX")
> conf.set("es.update.script.params", "param1:events")
> conf.set("es.update.retry.on.conflict" , "2")
> conf.set("es.write.operation", "upsert")
> conf.set("es.input.json", "true")
> val documentsRdd =  ...
> documentsRdd.saveJsonToEs("test/user", scala.collection.Map("
> es.mapping.id" -> "_id", "es.mapping.exclude" -> "_id"))
>
> The JSON looks like that :
> {
>   "_id": "",
>   "_type": "user",
>   "_index": "test",
>   "params": {
> "events": [
>   {
> ...
>   }
> ]
>   }
>
> Thanks!
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aea88dfb-8d4b-49d1-a236-8de6d513b4f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


  1   2   >