Re: What tools to use for indexing ES in production?
Well, the sales messages are coming through kafka. We need to extract some info from the database. We can do anything really. I'm just not sure what are the common practice here. It seems to be so many options. what kind of questions am I not asking here? On Monday, February 16, 2015 at 2:30:45 AM UTC-8, Mark Walkom wrote: This depends on how the app is made and what options you have to extract data from it. On 16 February 2015 at 20:28, Kevin Liu ke...@ticketfly.com javascript: wrote: We want to index our sales records as they come in from our apps. We are using quartz job right now, which is really slow and not really real-time. We will be implementing a message bus soon for firing sales event. The process is: read from a message queue grab some extra data from mysql do some ETL to construct the document index it to ES. I've been reading about apache spark, ES river, logstash. My questions is what kind of tools are right for the job here? is apache spark an over kill? is DIY a better option here? what are you guys using? please advice and point me to the right thing to read. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a11083b-4e9f-4be2-9a8e-83caf34d56b4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is it safe to change node names in an existing ElasticSearch cluster
Yes. David Le 19 févr. 2015 à 08:56, Jan-Erik Westlund je.westl...@gmail.com a écrit : Correct, in that case, it will not be a rolling upgrade ;-) The service will be down for a few minutes. Can I then change all the nodenames, and the start the services on all the nodes with the new names without messing things up ? 2015-02-19 7:58 GMT+01:00 David Pilato da...@pilato.fr: You should define this in that case: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after But it's not anymore a rolling upgrade, right? Your service will be down for some seconds/minutes I guess. David Le 19 févr. 2015 à 07:52, Jan-Erik Westlund je.westl...@gmail.com a écrit : I understand that, but is it safe to change all the nodenames and restart all the nodes at the same time ? Skickat från min iPhone 6. 19 feb 2015 kl. 07:47 skrev David Pilato da...@pilato.fr: You can change safely elasticsearch.yml file while elasticsearch is running. This file is only loaded when elasticsearch starts. David Le 19 févr. 2015 à 07:33, Jan-Erik Westlund je.westl...@gmail.com a écrit : Hi again ! Thanks for Rolling restart info, that was really helpful. But since the elasticsearch.yml file is managed by Puppet, all the nodenames will change pretty much at the same time ! So in my case it would be best to shutdown the ES daemon on all nodes first, apply the Puppet changes and then start the ES cluster again... Is it safe to do so ? //Jan-Erik Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato: Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr | @scrutmydocs Le 18 févr. 2015 à 16:37, Jan-Erik Westlund je.we...@gmail.com a écrit : Thanks David ! All my Recovery Throttling settings are default in the elasticsearch.yml file. How do I disable allocation, in a running production environment ? Do I need to disable allocation first, restart each node / daemon, and after rename the nodes ? Or maybe it would be better to down the ES cluster (all 3 nodes) during a maintenance windows, change all names, and then restart the ES cluster nodes again ? //Jan-Erik Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato: Yes. It’s safe. You can do it one at a time. If you already have data around and don’t want your shards moving during this, you should disable allocation. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr | @scrutmydocs Le 18 févr. 2015 à 16:14, Jan-Erik Westlund je.we...@gmail.com a écrit : Hi ! Is it safe to change the node names of my 3 nodes in an existing elasticsearch 1.4.0 cluster ? The reason is to get rid of the random names like: Elizabeth Betsy Braddock, Franz Kafka, etc... Is it just to set the node.name: server name in elasticsearch.yml and then restart the daemon ? Do I do it one node at the time, or do I need down the cluster and then change all node names, and then bring up the cluster again ? //Jan-Erik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9eca2130-030d-468c-825d-b66c8766ba4a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/ZjOHjpXVZ00/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the
Re: Aggregations failing on fields with custom analyzer..
I don’t know without a concrete example. I’d say that if you map have a type number and you send 123 it could work. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:30, Anil Karaka anilkar...@gmail.com a écrit : It was my mistake, the field I was trying to do an aggregation was mapped double, I assumed its a string, after seeing some sample documents with strings.. Why didn't es throw an error when I'm indexing docs with strings instead of double..? On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote: Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com javascript: a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive case_insensitive_keyword_analyzer: { tokenizer: keyword, filter: lowercase } } } } But when I'm trying to do term aggregation over a field with strings analyzed as above, I'm getting this error.. { error :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket cannot be cast to org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket], status : 500 } Are there additional settings that I have to update in my custom analyzer for my terms aggregation to work..? The better question is I want a custom analyzer that does everything similar to not_analyzed but is case insensitive.. How do I achieve that? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9B7CB626-45FA-4856-B735-8CD6912B7FBD%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Aggregations failing on fields with custom analyzer..
It was my mistake, the field I was trying to do an aggregation was mapped double, I assumed its a string, after seeing some sample documents with strings.. Why didn't es throw an error when I'm indexing docs with strings instead of double..? On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote: Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com javascript: a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive case_insensitive_keyword_analyzer: { tokenizer: keyword, filter: lowercase } } } } But when I'm trying to do term aggregation over a field with strings analyzed as above, I'm getting this error.. { error :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket cannot be cast to org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket], status : 500 } Are there additional settings that I have to update in my custom analyzer for my terms aggregation to work..? The better question is I want a custom analyzer that does everything similar to not_analyzed but is case insensitive.. How do I achieve that? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch search performance question
Good stuff. You're seeing the benefits of not caching lots of single-use BitSets. Now if you swap queries for your filters then you'll also see the benefits of not allocating multi-megabyte BitSets to hold what is typically a single bit for each query that you run. On Thursday, February 19, 2015 at 6:23:08 AM UTC, Jay Danielian wrote: Just to update the thread. I added code to disable cache on all the term filters we were using, and it made a huge performance improvement. Now we are able to service the queries with average response time under two seconds, which is excellent (we are bundling several searches using _msearch, so 2 seconds total response is good) The search requests / sec metric is still peaking at around 600 / sec, however our CPU only spikes to about 65% now - so I think we can add more search threads to our config as we are no longer maxing out CPU. I also see a a bit of disk read activity now, which against our non RAID EBS drive - means we may be able to squeeze more if we switch disk setup. It seems like having these filters add cache items was wasting CPU on cache eviction and cache lookups (cache misses really) for each query - which really only shows up when trying to push some load through. Thanks for everyone's suggestions!! J On Friday, February 13, 2015 at 11:55:52 AM UTC-5, Jay Danielian wrote: Thanks to all for these great suggestions. I haven't had a chance to change the syntax yet, as that is a risky thing for me to quickly change against our production setup. My plan is to try that this weekend (so I can properly test the new syntax is returning the same results). However, is there a way to turn filter caching off globally via config or elsewhere? Thanks! J On Friday, February 13, 2015 at 11:25:20 AM UTC-5, Mark Harwood wrote: So I can see in the hot threads dump the initialization requests for those FixedBitSets I was talking about. Looking at the number of docs in your index I estimate each Term to be allocating 140mb of memory in total for all these bitsets across all shards given the 1bn docs in your index. Remember that you are probably setting only a single bit in each of these large structures. Another stat (if I read it correctly) shows 5m evictions of these cached filters given their low reusability. It's fair to say you have some cache churn going on :) Did you try my earlier suggestion of queries not filters? On Friday, February 13, 2015 at 2:29:42 PM UTC, Jay Danielian wrote: As requested here is a dump of the hot threads output. Thanks! J On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett wrote: You might want to try hitting hot threads while putting your load on it and seeing what you see. Or posting it. Nik On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian jay.da...@circleback.com wrote: Mark, Thanks for the initial reply. Yes, your assumption about these things being very specific and thus not likely to have any re-use with regards to caching is correct. I have attached some screenshots from the BigDesk plugin which showed a decent snapshot of what the server looked like while my tests were running. You can see the spikes in CPU, that essentially covered the duration when the JMeter tests were running. At a high level, the only thing that seems to be really stressed on the server is CPU. But that makes me think that there is something in my setup , query syntax, or perhaps the cache eviction rate, etc that is causing it to spike so high. I also have concerns about non RAID 0 the EBS volumes, as I know that having one large volume does not maximize throughput - however, just looking at the stats it doesn't seem like IO is really a bottleneck. Here is a sample query structure = https://gist.github.com/jaydanielian/c2be885987f344031cfc Also this is one query - in reality we use _msearch to pipeline several of these queries in one batch. The queries also include custom routing / route key to make sure we only hit one shard. Thanks! J On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote: It'd help if you could gist/pastebin/etc a query example. Also your current ES and java need updating, there are known issues with java 1.7u55, and you will always see performance boosts running the latest version of ES. That aside, what is your current resource utilisation like? Are you seeing lots of cache evictions, high heap use, high CPU, IO delays? On 13 February 2015 at 07:32, Jay Danielian jay.da...@circleback.com wrote: I know this is difficult to answer, the real answer is always It Depends :) But I am going to go ahead and hope I get some feedback here. We are mainly using ES to issue terms searches against fields that are non-analyzed. We are using ES like a key value store, where once the match is found we parse the _source JSON and return our model. We
Re: Aggregations failing on fields with custom analyzer..
If you can provide a full example working as I did, we can try it and see what is wrong. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 10:01, Anil Karaka anilkar...@gmail.com a écrit : Im getting this error as well using your PUT requests.. It feels like I'm doing something wrong.. But I don't know what exactly.. I'm using this index template.. https://gist.github.com/syllogismos/c2dde4f097fea149e1a0 I didn't specify a particular mapping from my index but reindexed from a previous index.. and ended up with that mapping and documents that looks like above.. Am I seeing things and an obvious mistake? So lost right now.. On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote: I think you are doing something wrong. DELETE index PUT index { mappings: { doc: { properties: { foo: { type: double } } } } } PUT index/doc/1 { foo: bar } gives: { error: MapperParsingException[failed to parse [foo]]; nested: NumberFormatException[For input string: \bar\]; , status: 400 } -- David Pilato | Technical Advocate | Elasticsearch.com http://elasticsearch.com/ @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:39, Anil Karaka anilk...@gmail.com javascript: a écrit : _source : { Sort : , gt : 2015-02-18T15:07:10, uid : 54867dc55b482b04da7f23d8, usId : 54867dc55b482b04da7f23d7, ut : 2015-02-18T20:37:10, act : productlisting, st : 2015-02-18T15:07:46, Filter : , av : 3.0.0.0, ViewType : SmallSingleList, os : Windows, categoryid : home-kitchen-curtains-blinds } properties : { uid : { analyzer : case_insensitive_keyword_analyzer, type : string }, ViewType : { analyzer : case_insensitive_keyword_analyzer, type : string }, usId : { analyzer : case_insensitive_keyword_analyzer, type : string }, os : { analyzer : case_insensitive_keyword_analyzer, type : string }, Sort : { analyzer : case_insensitive_keyword_analyzer, type : string }, Filter : { analyzer : case_insensitive_keyword_analyzer, type : string }, categoryid : { type : double }, gt : { format : dateOptionalTime, type : date }, ut : { format : dateOptionalTime, type : date }, st : { format : dateOptionalTime, type : date }, act : { analyzer : case_insensitive_keyword_analyzer, type : string }, av : { analyzer : case_insensitive_keyword_analyzer, type : string } } A sample document and the index mappings above.. On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote: I don’t know without a concrete example. I’d say that if you map have a type number and you send 123 it could work. -- David Pilato | Technical Advocate | Elasticsearch.com http://elasticsearch.com/ @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com a écrit : It was my mistake, the field I was trying to do an aggregation was mapped double, I assumed its a string, after seeing some sample documents with strings.. Why didn't es throw an error when I'm indexing docs with strings instead of double..? On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote: Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive case_insensitive_keyword_analyzer: { tokenizer: keyword, filter: lowercase } } } } But when I'm trying to do term aggregation over a field with strings analyzed as above, I'm getting this error.. { error :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket cannot be cast to org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket],
Re: Is it safe to change node names in an existing ElasticSearch cluster
Ok, thanks again. 2015-02-19 9:06 GMT+01:00 David Pilato da...@pilato.fr: Yes. David Le 19 févr. 2015 à 08:56, Jan-Erik Westlund je.westl...@gmail.com a écrit : Correct, in that case, it will not be a rolling upgrade ;-) The service will be down for a few minutes. Can I then change all the nodenames, and the start the services on all the nodes with the new names without messing things up ? 2015-02-19 7:58 GMT+01:00 David Pilato da...@pilato.fr: You should define this in that case: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after But it's not anymore a rolling upgrade, right? Your service will be down for some seconds/minutes I guess. David Le 19 févr. 2015 à 07:52, Jan-Erik Westlund je.westl...@gmail.com a écrit : I understand that, but is it safe to change all the nodenames and restart all the nodes at the same time ? Skickat från min iPhone 6. 19 feb 2015 kl. 07:47 skrev David Pilato da...@pilato.fr: You can change safely elasticsearch.yml file while elasticsearch is running. This file is only loaded when elasticsearch starts. David Le 19 févr. 2015 à 07:33, Jan-Erik Westlund je.westl...@gmail.com a écrit : Hi again ! Thanks for Rolling restart info, that was really helpful. But since the elasticsearch.yml file is managed by Puppet, all the nodenames will change pretty much at the same time ! So in my case it would be best to shutdown the ES daemon on all nodes first, apply the Puppet changes and then start the ES cluster again... Is it safe to do so ? //Jan-Erik Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato: Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/cluster-nodes-shutdown.html#_rolling_ restart_of_nodes_full_cluster_restart -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 18 févr. 2015 à 16:37, Jan-Erik Westlund je.we...@gmail.com a écrit : Thanks David ! All my Recovery Throttling settings are default in the elasticsearch.yml file. How do I disable allocation, in a running production environment ? Do I need to disable allocation first, restart each node / daemon, and after rename the nodes ? Or maybe it would be better to down the ES cluster (all 3 nodes) during a maintenance windows, change all names, and then restart the ES cluster nodes again ? //Jan-Erik Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato: Yes. It’s safe. You can do it one at a time. If you already have data around and don’t want your shards moving during this, you should disable allocation. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://elasticsearch.com/* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 18 févr. 2015 à 16:14, Jan-Erik Westlund je.we...@gmail.com a écrit : Hi ! Is it safe to change the node names of my 3 nodes in an existing elasticsearch 1.4.0 cluster ? The reason is to get rid of the random names like: Elizabeth Betsy Braddock, Franz Kafka, etc... Is it just to set the node.name: server name in elasticsearch.yml and then restart the daemon ? Do I do it one node at the time, or do I need down the cluster and then change all node names, and then bring up the cluster again ? //Jan-Erik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion
Re: Aggregations failing on fields with custom analyzer..
_source : { Sort : , gt : 2015-02-18T15:07:10, uid : 54867dc55b482b04da7f23d8, usId : 54867dc55b482b04da7f23d7, ut : 2015-02-18T20:37:10, act : productlisting, st : 2015-02-18T15:07:46, Filter : , av : 3.0.0.0, ViewType : SmallSingleList, os : Windows, categoryid : home-kitchen-curtains-blinds } properties : { uid : { analyzer : case_insensitive_keyword_analyzer, type : string }, ViewType : { analyzer : case_insensitive_keyword_analyzer, type : string }, usId : { analyzer : case_insensitive_keyword_analyzer, type : string }, os : { analyzer : case_insensitive_keyword_analyzer, type : string }, Sort : { analyzer : case_insensitive_keyword_analyzer, type : string }, Filter : { analyzer : case_insensitive_keyword_analyzer, type : string }, categoryid : { type : double }, gt : { format : dateOptionalTime, type : date }, ut : { format : dateOptionalTime, type : date }, st : { format : dateOptionalTime, type : date }, act : { analyzer : case_insensitive_keyword_analyzer, type : string }, av : { analyzer : case_insensitive_keyword_analyzer, type : string } } A sample document and the index mappings above.. On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote: I don’t know without a concrete example. I’d say that if you map have a type number and you send 123 it could work. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com javascript: a écrit : It was my mistake, the field I was trying to do an aggregation was mapped double, I assumed its a string, after seeing some sample documents with strings.. Why didn't es throw an error when I'm indexing docs with strings instead of double..? On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote: Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive case_insensitive_keyword_analyzer: { tokenizer: keyword, filter: lowercase } } } } But when I'm trying to do term aggregation over a field with strings analyzed as above, I'm getting this error.. { error :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket cannot be cast to org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket], status : 500 } Are there additional settings that I have to update in my custom analyzer for my terms aggregation to work..? The better question is I want a custom analyzer that does everything similar to not_analyzed but is case insensitive.. How do I achieve that? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: Aggregations failing on fields with custom analyzer..
Im getting this error as well using your PUT requests.. It feels like I'm doing something wrong.. But I don't know what exactly.. I'm using this index template.. https://gist.github.com/syllogismos/c2dde4f097fea149e1a0 I didn't specify a particular mapping from my index but reindexed from a previous index.. and ended up with that mapping and documents that looks like above.. Am I seeing things and an obvious mistake? So lost right now.. On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote: I think you are doing something wrong. DELETE index PUT index { mappings: { doc: { properties: { foo: { type: double } } } } } PUT index/doc/1 { foo: bar } gives: { error: MapperParsingException[failed to parse [foo]]; nested: NumberFormatException[For input string: \bar\]; , status: 400 } -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:39, Anil Karaka anilk...@gmail.com javascript: a écrit : _source : { Sort : , gt : 2015-02-18T15:07:10, uid : 54867dc55b482b04da7f23d8, usId : 54867dc55b482b04da7f23d7, ut : 2015-02-18T20:37:10, act : productlisting, st : 2015-02-18T15:07:46, Filter : , av : 3.0.0.0, ViewType : SmallSingleList, os : Windows, categoryid : home-kitchen-curtains-blinds } properties : { uid : { analyzer : case_insensitive_keyword_analyzer, type : string }, ViewType : { analyzer : case_insensitive_keyword_analyzer, type : string }, usId : { analyzer : case_insensitive_keyword_analyzer, type : string }, os : { analyzer : case_insensitive_keyword_analyzer, type : string }, Sort : { analyzer : case_insensitive_keyword_analyzer, type : string }, Filter : { analyzer : case_insensitive_keyword_analyzer, type : string }, categoryid : { type : double }, gt : { format : dateOptionalTime, type : date }, ut : { format : dateOptionalTime, type : date }, st : { format : dateOptionalTime, type : date }, act : { analyzer : case_insensitive_keyword_analyzer, type : string }, av : { analyzer : case_insensitive_keyword_analyzer, type : string } } A sample document and the index mappings above.. On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote: I don’t know without a concrete example. I’d say that if you map have a type number and you send 123 it could work. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://elasticsearch.com/* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com a écrit : It was my mistake, the field I was trying to do an aggregation was mapped double, I assumed its a string, after seeing some sample documents with strings.. Why didn't es throw an error when I'm indexing docs with strings instead of double..? On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote: Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive case_insensitive_keyword_analyzer: { tokenizer: keyword, filter: lowercase } } } } But when I'm trying to do term aggregation over a field with strings analyzed as above, I'm getting this error.. { error :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket cannot be cast to org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket], status : 500 } Are there additional settings that I have to update in my custom analyzer for my terms aggregation to work..? The better question is I want a custom analyzer that does everything similar to not_analyzed but is case insensitive.. How do I achieve that? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this
Re: Aggregations failing on fields with custom analyzer..
I understand what you are saying.. I was able to recreate the same error you showed myself.. I was not able to insert into your index whose mapping is double, but I am able to insert a string into my older index whose mapping is double.. Very weird.. But I don't know how you could recreate my case.. I'm using this index template, https://gist.github.com/syllogismos/c2dde4f097fea149e1a0 and then reindexed from an older index.. and it took the mapping as double, and has strings in the indexed documents later.. Thanks for your help.. On Thursday, February 19, 2015 at 2:34:14 PM UTC+5:30, David Pilato wrote: If you can provide a full example working as I did, we can try it and see what is wrong. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 10:01, Anil Karaka anilk...@gmail.com javascript: a écrit : Im getting this error as well using your PUT requests.. It feels like I'm doing something wrong.. But I don't know what exactly.. I'm using this index template.. https://gist.github.com/syllogismos/c2dde4f097fea149e1a0 I didn't specify a particular mapping from my index but reindexed from a previous index.. and ended up with that mapping and documents that looks like above.. Am I seeing things and an obvious mistake? So lost right now.. On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote: I think you are doing something wrong. DELETE index PUT index { mappings: { doc: { properties: { foo: { type: double } } } } } PUT index/doc/1 { foo: bar } gives: { error: MapperParsingException[failed to parse [foo]]; nested: NumberFormatException[For input string: \bar\]; , status: 400 } -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://elasticsearch.com/* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:39, Anil Karaka anilk...@gmail.com a écrit : _source : { Sort : , gt : 2015-02-18T15:07:10, uid : 54867dc55b482b04da7f23d8, usId : 54867dc55b482b04da7f23d7, ut : 2015-02-18T20:37:10, act : productlisting, st : 2015-02-18T15:07:46, Filter : , av : 3.0.0.0, ViewType : SmallSingleList, os : Windows, categoryid : home-kitchen-curtains-blinds } properties : { uid : { analyzer : case_insensitive_keyword_analyzer, type : string }, ViewType : { analyzer : case_insensitive_keyword_analyzer, type : string }, usId : { analyzer : case_insensitive_keyword_analyzer, type : string }, os : { analyzer : case_insensitive_keyword_analyzer, type : string }, Sort : { analyzer : case_insensitive_keyword_analyzer, type : string }, Filter : { analyzer : case_insensitive_keyword_analyzer, type : string }, categoryid : { type : double }, gt : { format : dateOptionalTime, type : date }, ut : { format : dateOptionalTime, type : date }, st : { format : dateOptionalTime, type : date }, act : { analyzer : case_insensitive_keyword_analyzer, type : string }, av : { analyzer : case_insensitive_keyword_analyzer, type : string } } A sample document and the index mappings above.. On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote: I don’t know without a concrete example. I’d say that if you map have a type number and you send 123 it could work. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://elasticsearch.com/* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com a écrit : It was my mistake, the field I was trying to do an aggregation was mapped double, I assumed its a string, after seeing some sample documents with strings.. Why didn't es throw an error when I'm indexing docs with strings instead of double..? On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote: Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive
Re: ES OOMing and not triggering cache circuit breakers, using LocalManualCache
After some experimentation, I believe _cluster/stats shows the total field data across the whole cluster. I manged to push my test cluster to 198MiB field data cache usage. As a result, based on Zachary's feedback, I've set the following values in my elasticsearch.yml: indices.fielddata.cache.size: 15gb indices.fielddata.cache.expire: 7d On Thursday, 12 February 2015 15:15:32 UTC, Wilfred Hughes wrote: Oh, is field data per-node or total across the cluster? I grabbed a test cluster with two data nodes, and I deliberately set fielddata really low: indices.fielddata.cache.size: 100mb However, after a few queries, I'm seeing more than 100MiB in use: $ curl http://localhost:9200/_cluster/stats?humanpretty; ... fielddata: { memory_size: 119.7mb, memory_size_in_bytes: 125543995, evictions: 0 }, Is this expected? On Wednesday, 11 February 2015 18:57:28 UTC, Zachary Tong wrote: LocalManualCache is a component of Guava's LRU cache https://code.google.com/p/guava-libraries/source/browse/guava-gwt/src-super/com/google/common/cache/super/com/google/common/cache/CacheBuilder.java, which is used by Elasticsearch for both the filter and field data cache. Based on your node stats, I'd agree it is the field data usage which is causing your OOMs. CircuitBreaker helps prevent OOM, but it works on a per-request basis. It's possible for individual requests to pass the CB because they use small subsets of fields, but over-time the set of fields loaded into Field Data continues to grow and you'll OOM anyway. I would prefer to set a field data limit, rather than an expiration. A hard limit prevents OOM because you don't allow the cache to grow anymore. An expiration does not guarantee that, since you could get a burst of activity that still fills up the heap and OOMs before the expiration can work. -Z On Wednesday, February 11, 2015 at 12:50:45 PM UTC-5, Wilfred Hughes wrote: After examining some other nodes that were using a lot of their heap, I think this is actually field data cache: $ curl http://localhost:9200/_cluster/stats?humanpretty; ... fielddata: { memory_size: 21.3gb, memory_size_in_bytes: 22888612852, evictions: 0 }, filter_cache: { memory_size: 6.1gb, memory_size_in_bytes: 6650700423, evictions: 12214551 }, Since this is storing logstash data, I'm going to add the following lines to my elasticsearch.yml and see if I observe a difference once deployed to production. # Don't hold field data caches for more than a day, since data is # grouped by day and we quickly lose interest in historical data. indices.fielddata.cache.expire: 1d On Wednesday, 11 February 2015 16:29:22 UTC, Wilfred Hughes wrote: Hi all I have an ES 1.2.4 cluster which is occasionally running out of heap. I have ES_HEAP_SIZE=31G and according to the heap dump generated, my biggest memory users were: org.elasticsearch.common.cache.LocalCache$LocalManualCache 55% org.elasticsearch.indices.cache.filter.IndicesFilterCache 11% and nothing else used more than 1%. It's not clear to me what this cache is. I can't find any references to ManualCache in the elasticsearch source code, and the docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/index-modules-fielddata.html suggest to me that the circuit breakers should stop requests or reduce cache usage rather that OOMing. At the moment my cache was filled up, the node was actually trying to index some data: [2015-02-11 08:14:29,775][WARN ][index.translog ] [data-node-2] [logstash-2015.02.11][0] failed to flush shard on translog threshold org.elasticsearch.index.engine.FlushFailedEngineException: [logstash-2015.02.11][0] Flush failed at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:805) at org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:604) at org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:202) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4416) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2989) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3096) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3063) at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:797) ... 5 more [2015-02-11 08:14:29,812][DEBUG][action.bulk
elasticsearch-http-basic with ES 1.4.2
Has anyone had any luck with using http-basic with ES 1.4.2? I just want to put some basic security on my ES instance from outside of the clusters and this appears to be the easiest way with just white listing my other nodes. When I install it and configure it, it shows it going to the http-basic plugin but it always accepts the username/password from localhost even if I put the wrong info in there. It also never prompts for username/password from other IPs connecting to it. Locally it shows this: *[root@elasticsearch1 http-basic]# curl -v --user bob:wrongpassword localhost:9200* * About to connect() to localhost port 9200 (#0) * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 9200 (#0) * Server auth using Basic with user 'bob' GET / HTTP/1.1 Authorization: Basic Ym9iOnBhc3N3b3JkMTIzNTU1 User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2 Host: localhost:9200 Accept: */* HTTP/1.1 200 OK Content-Type: text/plain; charset=UTF-8 Content-Length: 9 * Connection #0 to host localhost left intact * Closing connection #0 From external sources it shows this in the logs. [2015-02-19 14:56:29,816][INFO ][com.asquera.elasticsearch.plugins.http.HttpBasicServer] [elasticsearch1] Authorization:null, Host:192.168.1.4:9200, Path:/, :null, Request-IP:192.168.1.4, Client-IP:null, X-Client-IPnull -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d2f2ac4-a8fd-4538-bc21-e0cde135a84d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elastic search on t2.micro (Amazon WS)
Depends on your dataset and use. I don't go below 2GB heaps when I'm just testing things. On 20 February 2015 at 05:52, Seung Chan Lim djs...@gmail.com wrote: What's the minimum RAM requirement? slim On Wednesday, February 18, 2015 at 5:18:58 PM UTC-5, Mark Walkom wrote: Your only real option here is to get a machine with more RAM. Try spinning up a VM locally, on your desk/laptop. On 19 February 2015 at 00:52, Seung Chan Lim djs...@gmail.com wrote: I'm trying to see if I can get elastic search (ES) 1.3.8 working with couchbase (CB) 3.0.2 on a t2.micro (Amazon WS) t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test development on this with not a lot of documents (1000). I just installed ES and followed the CB instructions to install the plugin and set the XDCR to get the replication going from CB to ES. I also configured ES to have 0 replication and 1 shard (hoping this would help minimize RAM usage). But I'm still seeing behaviors from ES where it locks up the server, making it unresponsive than eventually complaining of lack of memory. Is there something else I can do to get this working on a t2.micro? I'm a complete newbie to ES, and any help would be great, thank you slim -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e6259e3d-2240-4a67-8385-17fc00d6dcbb%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e6259e3d-2240-4a67-8385-17fc00d6dcbb%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_o2vJZxTqofujMacmYbDXF8UkwQ4qBHAQ202GhgcztNg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
[Spark] Unable to index JSON from HDFS using SchemaRDD.saveToES()
This is my first real attempt at spark/scala so be gentle. I have a file called test.json on HDFS that I'm trying to read and index using Spark. I'm able to read the file via SQLContext.jsonFile() but when I try to use SchemaRDD.saveToEs() I get an invalid JSON fragment received error. I'm thinking that the saveToES() function isn't actually formatting the output in json and instead is just sending the value field of the RDD. What am I doing wrong? Spark 1.2.0 Elasticsearch-hadoop 2.1.0.BUILD-20150217 test.json: {key:value} spark-shell: import org.apache.spark.SparkContext._ import org.elasticsearch.spark._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val input = sqlContext.jsonFile(hdfs://nameservice1/user/mshirley/test.json) input.saveToEs(mshirley_spark_test/test) error: snip org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - Invalid JSON fragment received[[value]][MapperParsingException[failed to parse]; n ested: ElasticsearchParseException[Failed to derive xcontent from (offset=13, length=9): [123, 34, 105, 110, 100, 101, 120, 34, 58, 123, 125, 125, 10, 91, 34, 118, 97, 108, 117, 101, 3 4, 93, 10]]; ]]; Bailing out.. snip input: res2: org.apache.spark.sql.SchemaRDD = SchemaRDD[6] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == PhysicalRDD [key#0], MappedRDD[5] at map at JsonRDD.scala:47 input.printSchema(): root |-- key: string (nullable = true) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc6caa8f-b309-488c-8b1b-4cbef1e1c9fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch index name question
I use a Monitoring Framework designed as a solution to monitor heterogeneous networks and systems in terms of services (platforms, applications for TELCO systems). This framework collects in synchronous way the required data from several devices, it stores them in a Mongo Data Base and then transfers all stored collections from MongoDB to Elasticsearch via river-mongodb plugin. We can have a huge amount of data stored in a single index of Elasticsearch, for example, about 5.2 millions of documents can be collected in a single MongoDB collection for only 8 hours of monitoring and so the number of documents in a single index grows rapidly. At present, I have installed on Centos 6.5 server an Elasticsearch Cluster configuration with one node and five indices but only one index for all synchronous data. My problem is to be able to create different indices in Elasticsearch where I can share the synchronous data, and so I would like to know if it is possible to create an index name with a timestamp appended to it, like so Logstash uses the timestamp from an event to derive the related Elasticsearch index name. Some idea, suggestion, help? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/37bb3713-56d9-443c-b3a5-9056092b958d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elastic search cluster behind azure public load balncer
Ok Thanks Mark. So you are saying I should set node.master: true as well as node.data: true for all the 3 nodes in the cluster. And this will hold true for 4 node or even more node cluster ? As of now I am making data redundant on all 3 nodes but please suggest performance... azure Load balancer chances of failure are much lesser than VM. The redundancy is required from backup point of view. So I will have 4 node cluster with data replicated on all 4 nodes. 3 nodes in the same region (say East US) and behind load balancer and 4th node will be in different region (say west us) but NOT behind load balancer and just holding the all data. In case east us data center has problem then I can redirect all traffic to west US data center where single node will always have all update data, or even i can take backups from west us data center. Thanks, Subodh On Thursday, February 19, 2015 at 7:10:52 AM UTC+5:30, Mark Walkom wrote: Yes the master can serve requests. You don't really want 2 masters and 1 data node though, make all 3 master+data to start with. And sure the client can be a SPOF, but then isn't a single load balancer a SPOF as well? So the question remains, where are you happy dealing with these points, because at some point you cannot make *everything* redundant without being excessive. On 18 February 2015 at 21:36, Subodh Patil subod...@gmail.com javascript: wrote: i am trying to setup ES cluster behind azure load balancer /cloud service. the cluster is 3 node with no specific data/client/master node settings. By default 2 nodes are elected as master and 1 as data node. As the request (create/update/search) from application comes to azure load balancer on 9200 port which load balanced for all 3 vms the request can go to any vm. Will master node be able to serve the requests ? Many article says that you don't need load balancer for ES cluster just use client node but then it becomes single point of failure as azure vm can go down any point of time. so load balancing is required mainly for high availability from infrastructure point of view. please suggest cluster setup and which nodes (data or client) to be put behind load balancer. ES version 1.4.1 on windows server 2012 r2 vm -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d961148e-13d5-444f-bb33-368c9a468078%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Script based transform during index
Hi Group First apologies if this is not the right way to ask the below question but this is my first time. I have some documents with source IP and destination IP address. I want to enhance these documents with geo info with script transform when they arrive. So I created a script in python and I resolve geo info for every destination ip and store in _source (from what I understand) The template for the index is as below. Everything works fine however I have two issues. 1. The field (which does not exist and I create it namely location) is not shown in a search unless explicitly asked. 2. Kibana 3 does not show this field or it shows as empty. The field location is there if I explicitly ask for it. Can you please let me know how I can have these added fields prior to index available as normal fields ? Thanks in advance ! P.S inside the python script I update the below ctx['location'] = ip2geo(dest_ip) ctx['_source']['location'] = ip2geo(dest_ip) POST /geotest/gdoc/_search { query: { match_all: {} }, fields: [ src_ip, dst_ip, location -- This is the new field which I add via ctx['_source']['location'] = ip2geo(dest_ip) ] } My template PUT /_template/geo { template: geo*, mappings: { gdoc: { transform: { lang: python, script: python_ip2geo }, _source: { enabled: true }, properties: { src_ip: { type: ip, index: not_analyzed }, dst_ip: { type: ip, index: not_analyzed }, location: { type: geo_point, index: analyzed, -- does not need to be analyzed really store: true, doc_values: true, null_value: } } } } } Can someone explain a bit more on how transform fields are stored and how they can be indexed ? Thanks in advance -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e552ca9-f110-47f0-aae5-63ea8f2a89d8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Date Histogram Bucket Count ?
Is it possible to apply a filter to date histogram buckets ? For example return only buckets that are below a certain value. I was looking to do it with scripts but dont know how to access buckets in script, so if anyone knows anything about that, it would be very very very helpful -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57052d83-57ea-4037-b87d-708eb040d557%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch performance tuning
Hi Mark Walkom, I have given below logstash conf file Logstash conf input { file { } } filter { mutate { gsub = [message, \n, ] } mutate { gsub = [message, \t, ] } multiline { pattern = ^ what = previous } grok { match = [ message, %{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}] match = [ path , %{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log] break_on_match = false } #To check location is S or L if [loccode] == S or [loccode] == L { ruby { code = temp = event['_machine'].split('_') if !temp.nil? || !temp.empty? event['_machine'] = temp[0] end } } mutate { add_field = [event_timestamp, %{@timestamp} ] replace = [ log_time, %{logdate} %{log_time} ] # Remove the 'logdate' field since we don't need it anymore. lowercase=[loccode] remove = logdate } # to get all site details (site name, city and co-ordinates) sitelocator{sitename = loccode datafile=vendor/sitelocator/SiteDetails.csv} date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } output { elasticsearch{ } } I have checked step by step to find bottleneck filter. Below filter which took much time. Can you guide me How can I tune it to get faster. date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558 Thanks Devaraj -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch Script merge the results of two aggregations
I use aggregrations on elasticsearch version 1.3.8. I use aggregation script for awhile today ıt didnt work. Please help ı cant find any solution : This the mapping: mappings: { product: { properties: { brandId: { type: integer }, brandIsActive: { type: boolean }, brandLink: { type: string, index: not_analyzed }, brandName: { type: string, index: not_analyzed } } } } this my query: post alias-test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value, size: 0 } } } } This is the error: { error: SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[g][mizu-20150219142655][0]: RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]]; nested: SearchParseException[[mizu-20150219142655][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value\]]]; nested: ExpressionScriptCompilationException[Field [brandName] used in expression must be numeric]; }], status: 400 } The other query : post test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value+'|'+doc['brandLink'].value, size: 0 } } } } the error : post test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value+'|'+doc['brandLink'].value, size: 0 } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to index multiple database query data to a single index file in Elastic search.
Hi, Want to index multiple database data to Elastic search, I have done the same in Solr using DataImportHandler and DeltaImportHandler. Is there any way to achieve the same in Elastic Search, also want to index all these multiple query output all at once. I have observed that we cant index multiple queries here in Elastic Search. Thanks Guru Pai. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f4d61ef-2116-4991-b93f-546898ff2462%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[ES-1.4.0] Snapshot Queue
Hi All, Is there a way to queue snapshot invocations? e.g.: snapshot_1 - snapshot_2 - - snapshot_N Thanks, Yarden -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33727f3a-f969-48fb-9c9b-fb360ccd9e08%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Script merge the results of two aggregations
I second error : { error: SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[g][test][0]: RemoteTransportException[[Mammomax][inet[/192.168.1.8:9300]][search/phase/query+fetch]]; nested: SearchParseException[[test][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value+'|'+doc['brandLink'].value\,\size\:0]]]; nested: ExpressionScriptCompilationException[Failed to parse expression: doc['brandName'].value+'|'+doc['brandLink'].value]; nested: ParseException[ unexpected character ''' at position (23).]; nested: NoViableAltException; }], status: 400 } 2015-02-19 14:49 GMT+02:00 ali balci balci.a...@gmail.com: I use aggregrations on elasticsearch version 1.3.8. I use aggregation script for awhile today ıt didnt work. Please help ı cant find any solution : This the mapping: mappings: { product: { properties: { brandId: { type: integer }, brandIsActive: { type: boolean }, brandLink: { type: string, index: not_analyzed }, brandName: { type: string, index: not_analyzed } } } } this my query: post alias-test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value, size: 0 } } } } This is the error: { error: SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[g][mizu-20150219142655][0]: RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]]; nested: SearchParseException[[mizu-20150219142655][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value\]]]; nested: ExpressionScriptCompilationException[Field [brandName] used in expression must be numeric]; }], status: 400 } The other query : post test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value+'|'+doc['brandLink'].value, size: 0 } } } } the error : post test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value+'|'+doc['brandLink'].value, size: 0 } } } } -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/aeNWfgYNVmA/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Best Regards ALİ BALCI Bilgisayar Mühendisligi Tel:0543 699 59 88 FACEBOOK http://www.facebook.com/alibalci.mail BLOG http://balciali.wordpress.com/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMLQj%3D%2B1k0DiPwXdtxYNMjcK%2BMeQmzSAdw-9D0nrVpP-mPU4nw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
I want to implement QueryElevationComponent feature in ELasticSearch
HI All, I want to implement QueryElevationComponent feature in ELasticSearch, Can anyone suggest me how can I go ahead with this feature in Elastic Search. Thanks Guru Pai. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/219ad580-8e19-418b-a775-79f61373141c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem with using uax_url_email
Hi, for people having the same problem like me, here an answer I received from Pablo in PT group: About your problem I beleive this is a constraint of the Apache Tika [1], which is used by the mapper-attachment plugin. I believe that a search over Tika pdf limitations or a question on their list will help you more than we can. Anyway, maybe you want to ask in the Elasticsearch main list [2], which is bigger than ours and has the Elasticsearch engineers. I am sorry for not being able to help you that much. Cheers, Pablo [1] http://tika.apache.org/ [2] elasti...@googlegroups.com Le mercredi 18 février 2015 15:37:33 UTC+1, Marria a écrit : Hi everybody, I want to perform URL extraction from my PDF files. I use mapper-attachment plugin to index my PDF files. In order to be able to perform some regex queries and extract all the urls present in a pdf file, I useduax_url_email: curl -X PUT localhost:9200/test -d '{ settings : { index: { analysis :{ analyzer: { default: { type : custom, tokenizer : uax_url_email, filter : [standard, lowercase, stop] } } } } } }' and the map : curl -X PUT localhost:9200/test/attachment/_mapping -d '{ attachment : { properties : { file : { type : attachment, fields : { title : { store : yes }, file : { term_vector:with_positions_offsets, store:yes } } } } } I indexed some PDF files, the problem is for a file , I get this (while urls in this file start with http://): https://lh3.googleusercontent.com/-6uzhp-v0qFs/VOSfMU95byI/AUc/H4c6xvb54kg/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.17.19.png for another file, I got this (it leaves the http:// ): https://lh3.googleusercontent.com/-1rYIYWJJEbU/VOSfweFpgbI/AUk/bWzfst_uZUE/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.19.43.png But the problem is the urls are not recognized completely , look at this: https://lh3.googleusercontent.com/-vsKUj5I9MiA/VOSgtyS3yWI/AUw/64lgO4gYSdI/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.22.32.png Is it caused by the double column representation in the PDF file? https://lh4.googleusercontent.com/-c7n5-oMygRM/VOShm4hwnWI/AU4/CQNjTTctMnY/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.26.46.png So, what did I do wrong? how can I fix this and use regexp queries successfully to extract all the URLs? Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12a4f452-6c6f-4e4f-ba00-97208efdbcba%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregations failing on fields with custom analyzer..
Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilkar...@gmail.com a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive case_insensitive_keyword_analyzer: { tokenizer: keyword, filter: lowercase } } } } But when I'm trying to do term aggregation over a field with strings analyzed as above, I'm getting this error.. { error :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket cannot be cast to org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket], status : 500 } Are there additional settings that I have to update in my custom analyzer for my terms aggregation to work..? The better question is I want a custom analyzer that does everything similar to not_analyzed but is case insensitive.. How do I achieve that? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/492932A0-CBC0-497B-A9D8-C6D707DC09B6%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Aggregations failing on fields with custom analyzer..
I think you are doing something wrong. DELETE index PUT index { mappings: { doc: { properties: { foo: { type: double } } } } } PUT index/doc/1 { foo: bar } gives: { error: MapperParsingException[failed to parse [foo]]; nested: NumberFormatException[For input string: \bar\]; , status: 400 } -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:39, Anil Karaka anilkar...@gmail.com a écrit : _source : { Sort : , gt : 2015-02-18T15:07:10, uid : 54867dc55b482b04da7f23d8, usId : 54867dc55b482b04da7f23d7, ut : 2015-02-18T20:37:10, act : productlisting, st : 2015-02-18T15:07:46, Filter : , av : 3.0.0.0, ViewType : SmallSingleList, os : Windows, categoryid : home-kitchen-curtains-blinds } properties : { uid : { analyzer : case_insensitive_keyword_analyzer, type : string }, ViewType : { analyzer : case_insensitive_keyword_analyzer, type : string }, usId : { analyzer : case_insensitive_keyword_analyzer, type : string }, os : { analyzer : case_insensitive_keyword_analyzer, type : string }, Sort : { analyzer : case_insensitive_keyword_analyzer, type : string }, Filter : { analyzer : case_insensitive_keyword_analyzer, type : string }, categoryid : { type : double }, gt : { format : dateOptionalTime, type : date }, ut : { format : dateOptionalTime, type : date }, st : { format : dateOptionalTime, type : date }, act : { analyzer : case_insensitive_keyword_analyzer, type : string }, av : { analyzer : case_insensitive_keyword_analyzer, type : string } } A sample document and the index mappings above.. On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote: I don’t know without a concrete example. I’d say that if you map have a type number and you send 123 it could work. -- David Pilato | Technical Advocate | Elasticsearch.com http://elasticsearch.com/ @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com javascript: a écrit : It was my mistake, the field I was trying to do an aggregation was mapped double, I assumed its a string, after seeing some sample documents with strings.. Why didn't es throw an error when I'm indexing docs with strings instead of double..? On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote: Did you apply your analyzer to your mapping? David Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit : http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear Posted in stack over flow as well.. On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote: I wanted a custom analyzer that behaves exactly like not_analyzed, except that fields are case insensitive.. I have my analyzer as below, index: { analysis: { analyzer: { // Custom Analyzer with keyword tokenizer and lowercase filter, same as not_analyzed but case insensitive case_insensitive_keyword_analyzer: { tokenizer: keyword, filter: lowercase } } } } But when I'm trying to do term aggregation over a field with strings analyzed as above, I'm getting this error.. { error :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket cannot be cast to org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket], status : 500 } Are there additional settings that I have to update in my custom analyzer for my terms aggregation to work..? The better question is I want a custom analyzer that does everything similar to not_analyzed but is case insensitive.. How do I achieve that? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com . To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the
disappearing log records: what can I do?
hello group, I have the following stack: fluentd (td-agent 2.1.3), elasticsearch (1.4.4), kibana (3.1.2) to manage the logging of an in-company application. At first glance, this seems to work OK, but it appears that from time to time, the reported records in kibana don't match the line count of the logfiles. Diving into this, it appears that when very large logfile are put in the flluentd logdirectory, not all records show up in elasticsearch. This does not show up in the logging of either fluentd or elasticsearch, so at first glance, everyting seems fine. I started with looking at fluentd and managed to get extra information, which seems to indicate that all of the log lines are processed. When comparing the wc -l of the logfile and the contents of ES, the difference becomes visible: ES:645551 wc:647506groot.log Looking at the thread pool statistics with the REST api, ES reports 60 bulk.rejects. Right now, I have a very simple configuration. cluster.name: cwc-dev index.number_of_replicas: 0 index.indexing.slowlog.threshold.index.warn: 10s index.indexing.slowlog.threshold.index.info: 5s index.indexing.slowlog.threshold.index.debug: 2s path.data: /data/elasticsearch I hope you can support me on how to tackle this, since I am quite new to ES. So I don't know which ways are available to get extra information on this. thanks in advance, Ruud -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15a4958b-9bb8-4ca8-852b-967e15111305%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Using 'elapsed' plugin with GrayLog2
I am working with a team who are creating a proof of concept monitoring and analytics system using a combination of LogStash, MongoDB, Elasticsearch and GrayLog2. I was wondering if anyone has any experience using the elapsed plugin with this set up and can tell me what (if any) issues they encountered? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d194879-4a2c-4201-ae88-0985a4a0d4b2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
waited for 30s and no initial state was set by the discovery
I know other people have posted this, but I've tried everything that the other threads have said to try. We keep getting this at startup of our java node client: INFO [2015-02-19 17:57:45,206] org.elasticsearch.node: [localhost] version[1.4.2], pid[11103], build[927caff/2014-12-16T14:11:12Z] INFO [2015-02-19 17:57:45,207] org.elasticsearch.node: [localhost] initializing ... INFO [2015-02-19 17:57:45,217] org.elasticsearch.plugins: [localhost] loaded [cloud-aws], sites [] INFO [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] initialized INFO [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] starting ... INFO [2015-02-19 17:57:47,716] org.elasticsearch.transport: [localhost] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.99.157.0:9300]} INFO [2015-02-19 17:57:49,747] org.elasticsearch.discovery: [localhost] elasticsearch-dev/EqKAxZm9SCutQGd-0_SonA WARN [2015-02-19 17:58:19,749] org.elasticsearch.discovery: [localhost] waited for 30s and no initial state was set by the discovery INFO [2015-02-19 17:58:19,761] org.elasticsearch.http: [localhost] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.99.157.0:9200]} INFO [2015-02-19 17:58:19,761] org.elasticsearch.node: [localhost] started This is the elasticsearch.yml on our master (and only data node right now): plugin.mandatory: cloud-aws cloud: aws: region: us-west-2 access_key: ACCESS_KEY secret_key: SECRET_KEY discovery: type: ec2 ec2: groups: DevAll And this is our java node client: Settings settings = ImmutableSettings.settingsBuilder() .put(node.name, nodeName) .put(cloud.aws.access_key, awsAccessKey) .put(cloud.aws.secret_key, awsSecretKey) .put(cloud.node.auto_attributes, true) .put(discovery.type, ec2) .build(); this.node = nodeBuilder() .clusterName(clusterName) .settings(settings) .client(true) .node(); this.client = node.client(); Any help would be greatly appreciated!!! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15f03a06-58a8-4ef3-9a73-93fc779dd6e1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cluster hanging on node failure
I posted here too: http://stackoverflow.com/questions/28601885/cluster-hanging-on-node-failure Would love to get some help with this. Best, Max Den onsdag 18 februari 2015 kl. 20:30:46 UTC+1 skrev Max Charas: Hello all of you bright people, We’re currently running a smallish 300 GB cluster in production on 5 nodes with around 30 mil docs. Everything works flawlessly except when a node really goes down (I mean like network/ HW failure/ kill -9). When we lose a node the cluster becomes more or less completely unresponsive for a few minutes. Both regarding indexing and querying. This is of course, less than ideal as we have load 24/7. I would really appreciate some help with understanding best practice settings to have a robust cluster. First goal for us is for the cluster to not become unresponsive in the event of a node crash. After reading everything I could find on the web I can't really understand if ES is designed to be unresponsive for ping_retries*ping_timeout seconds or if the cluster will continue to server query requests even during this time. Could anyone help me shed light on this? Secondly in the event of a even worse failure where the cluster goes into red state, would it be possible to allow the cluster to still serve read/query requests? I would be ever so grateful for anyone willing to help me understand how this works or what we would need to change to make our ES installation more robust. I’ve included our config here: cluster.name: clustername node.name: nodename path.data: /index node.master: true node.data: true discovery.zen.minimum_master_nodes: 3 discovery.zen.ping.multicast.enabled: false discovery.zen.ping.multicast.ping.enabled: false discovery.zen.ping.unicast.enabled: true discovery.zen.ping.unicast.hosts: [host1,host2,host3] bootstrap.mlockall: true index.number_of_shards: 10 action.disable_delete_all_indices: true marvel.agent.exporter.es.hosts: [marvel:9200] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb7171cd-a55e-4ccb-b15f-a6159931b3ff%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: waited for 30s and no initial state was set by the discovery
Additional information - we can telnet into or ES server from our application server on 9300. On Thursday, February 19, 2015 at 10:04:51 AM UTC-8, Diana Tuck wrote: I know other people have posted this, but I've tried everything that the other threads have said to try. We keep getting this at startup of our java node client: INFO [2015-02-19 17:57:45,206] org.elasticsearch.node: [localhost] version[1.4.2], pid[11103], build[927caff/2014-12-16T14:11:12Z] INFO [2015-02-19 17:57:45,207] org.elasticsearch.node: [localhost] initializing ... INFO [2015-02-19 17:57:45,217] org.elasticsearch.plugins: [localhost] loaded [cloud-aws], sites [] INFO [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] initialized INFO [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] starting ... INFO [2015-02-19 17:57:47,716] org.elasticsearch.transport: [localhost] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/ 10.99.157.0:9300]} INFO [2015-02-19 17:57:49,747] org.elasticsearch.discovery: [localhost] elasticsearch-dev/EqKAxZm9SCutQGd-0_SonA WARN [2015-02-19 17:58:19,749] org.elasticsearch.discovery: [localhost] waited for 30s and no initial state was set by the discovery INFO [2015-02-19 17:58:19,761] org.elasticsearch.http: [localhost] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/ 10.99.157.0:9200]} INFO [2015-02-19 17:58:19,761] org.elasticsearch.node: [localhost] started This is the elasticsearch.yml on our master (and only data node right now): plugin.mandatory: cloud-aws cloud: aws: region: us-west-2 access_key: ACCESS_KEY secret_key: SECRET_KEY discovery: type: ec2 ec2: groups: DevAll And this is our java node client: Settings settings = ImmutableSettings.settingsBuilder() .put(node.name, nodeName) .put(cloud.aws.access_key, awsAccessKey) .put(cloud.aws.secret_key, awsSecretKey) .put(cloud.node.auto_attributes, true) .put(discovery.type, ec2) .build(); this.node = nodeBuilder() .clusterName(clusterName) .settings(settings) .client(true) .node(); this.client = node.client(); Any help would be greatly appreciated!!! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f57f1be2-cda0-42c2-9001-33fa29b1a111%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ClassNotFoundException: org.elasticsearch.discovery.ec2.Ec2DiscoveryModule
Thanks, David. I eventually ended up finding the pom on the github repo. Thanks for adding the documentation!! On Wednesday, February 18, 2015 at 10:39:19 PM UTC-8, David Pilato wrote: Yes. This should be added to the doc: https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/176 You need to add this dependency if you are using a NodeClient: groupIdorg.elasticsearch/groupId artifactIdelasticsearch-cloud-aws/ artifactId version2.4.1/version HTH David Le 19 févr. 2015 à 01:15, Diana Tuck dtu...@gmail.com javascript: a écrit : New to ES - Trying to use the elasticsearch-cloud-aws plugin, but when starting my java client node, I'm getting ClassNotFoundException on org.elasticsearch.discovery.ec2.Ec2DiscoveryModule. Do I need to install this plugin on java client nodes, and if so, how does one do that? Or, rather, is there a maven dependency that can be referenced to load these required classes? For reference, the elasticsearch.yaml is: plugin.mandatory: cloud-aws cloud: aws: access_key: ** secret_key: * discovery: type: ec2 and my java client code is: Settings settings = ImmutableSettings.settingsBuilder() .put(node.name, nodeName) .put(cloud.aws.access_key, awsAccessKey) .put(cloud.aws.secret_key, awsSecretKey) .put(cloud.node.auto_attributes, true) .put(discovery.type, ec2) .build(); this.node = nodeBuilder() .clusterName(clusterName) .settings(settings) .client(true) .node(); this.client = node.client(); -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/925353fd-b717-417d-986f-570c634e39c1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/925353fd-b717-417d-986f-570c634e39c1%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3fc41382-3521-416d-8052-50f342462816%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elastic search on t2.micro (Amazon WS)
What's the minimum RAM requirement? slim On Wednesday, February 18, 2015 at 5:18:58 PM UTC-5, Mark Walkom wrote: Your only real option here is to get a machine with more RAM. Try spinning up a VM locally, on your desk/laptop. On 19 February 2015 at 00:52, Seung Chan Lim djs...@gmail.com javascript: wrote: I'm trying to see if I can get elastic search (ES) 1.3.8 working with couchbase (CB) 3.0.2 on a t2.micro (Amazon WS) t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test development on this with not a lot of documents (1000). I just installed ES and followed the CB instructions to install the plugin and set the XDCR to get the replication going from CB to ES. I also configured ES to have 0 replication and 1 shard (hoping this would help minimize RAM usage). But I'm still seeing behaviors from ES where it locks up the server, making it unresponsive than eventually complaining of lack of memory. Is there something else I can do to get this working on a t2.micro? I'm a complete newbie to ES, and any help would be great, thank you slim -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e6259e3d-2240-4a67-8385-17fc00d6dcbb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Multi Level nested search by using NEST API
I am having trouble to make multi level nested query by using NEST API. Here is my mapping. { log: { mappings:{ LogEvent: { properties:{ @timestamp:{type:date,store:true,format:-MM-dd'T'HH:mm:ss}, records: {type:nested,properties:{ eventtype : {type:string,store:true}, detail:{type:string,store:true}, others:{ type:nested,properties:{ ScrubbedContent:{type:string,store:true}, RawContent:{type:string,store:true} } } } } } } } And here is the query that works { from: 0, size: 1, query: { filtered: { filter: { and: { filters: [ { range: { @timestamp: { gte: 2015-02-12T02:37:32, lte: 2015-02-19T02:37:32 } } }, { nested: { filter: { terms: { records.eventtype: myeventtype } }, path: records } } ] } } } } } But if I change path: records to path: records.*others*, no result returned. I am pretty sure I should have results for it. Any thought why? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHsnNRjY8dLpzd260Pg6rkoAHVKvZWwC9OZS%3DUinpaoXTJU6Dg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: formula or guidelines to calculate/estimate the index size
There is nothing official. Just create an index, put in 1 of your documents and then extrapolate. On 20 February 2015 at 04:26, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Could anyone assist me to know how can I find a formula or guidelines to calculate/estimate the index size created by elastic search. I found that there is a formula (excel sheet) for it for LUCENE. Thanks! Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3Knr%3DusYtW8KyJVqbNsS0cK5z1MY5F-EvP94tC1%3DjBCDA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALZAj3Knr%3DusYtW8KyJVqbNsS0cK5z1MY5F-EvP94tC1%3DjBCDA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_tJ8fMjqqYerwWm9A0ui9%2BNP2kBOwm6LB-mtk94x9YMA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
how to avoid MapperParsingException?
Hi, I'm indexing documents with the following structure: { name: peter, email: p...@p.com, location: MIA } { name: mary, email: m...@m.com, device: ipad } { name: mary, email: m...@m.com, metadata: { ... } } As you can see, I only know the type for name and email fields. The location, device, metadata or whatever other field are dynamic fields. So, in order to avoid a MapperParsingException, I want to persist all of the document fields, but ONLY mark as searchable the name and email fields. Can I do that using mappings? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8bbe52be-59b4-4e78-8866-13716b3f862c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elastic search on t2.micro (Amazon WS)
What's your heap size (JVM setting) set to? It needs to be at most one half the machine's RAM, so set it to 500MB. J On Wednesday, February 18, 2015 at 3:52:22 PM UTC+2, Seung Chan Lim wrote: I'm trying to see if I can get elastic search (ES) 1.3.8 working with couchbase (CB) 3.0.2 on a t2.micro (Amazon WS) t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test development on this with not a lot of documents (1000). I just installed ES and followed the CB instructions to install the plugin and set the XDCR to get the replication going from CB to ES. I also configured ES to have 0 replication and 1 shard (hoping this would help minimize RAM usage). But I'm still seeing behaviors from ES where it locks up the server, making it unresponsive than eventually complaining of lack of memory. Is there something else I can do to get this working on a t2.micro? I'm a complete newbie to ES, and any help would be great, thank you slim -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc2e94e2-ebe5-4db3-badf-a24d47ba589f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch index name question
This is possible but not automatically within ES. LS knows it needs to switch to a new index at UTC, you need to find a way to get the river or some other code to do this. On 19 February 2015 at 21:59, Silvana Vezzoli silvana.vezz...@gmail.com wrote: I use a Monitoring Framework designed as a solution to monitor heterogeneous networks and systems in terms of services (platforms, applications for TELCO systems). This framework collects in synchronous way the required data from several devices, it stores them in a Mongo Data Base and then transfers all stored collections from MongoDB to Elasticsearch via river-mongodb plugin. We can have a huge amount of data stored in a single index of Elasticsearch, for example, about 5.2 millions of documents can be collected in a single MongoDB collection for only 8 hours of monitoring and so the number of documents in a single index grows rapidly. At present, I have installed on Centos 6.5 server an Elasticsearch Cluster configuration with one node and five indices but only one index for all synchronous data. My problem is to be able to create different indices in Elasticsearch where I can share the synchronous data, and so I would like to know if it is possible to create an index name with a timestamp appended to it, like so Logstash uses the timestamp from an event to derive the related Elasticsearch index name. Some idea, suggestion, help? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/37bb3713-56d9-443c-b3a5-9056092b958d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/37bb3713-56d9-443c-b3a5-9056092b958d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-cqwvKHdQETdZtWOTOUmwPWayYWKiNwmo6JXETno0%3DEg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Move Data from One index to Another Cluster Index
How can one move data from One index in a server to another Index in another Server. I know reindexing moves data into another index in the same cluster. How can it be done in Python ? Ap -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecb23365-08a3-4b0e-8036-ed2f8b1fbc27%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch performance tuning
Don't change cache and buffer sizes unless you know what is happening, the defaults are going to be fine. How much heap did you give ES? I'm not sure you can do much about the date filter though, maybe someone else has pointers. On 19 February 2015 at 21:12, Deva Raj devarajcse...@gmail.com wrote: Hi Mark Walkom, I have given below logstash conf file Logstash conf input { file { } } filter { mutate { gsub = [message, \n, ] } mutate { gsub = [message, \t, ] } multiline { pattern = ^ what = previous } grok { match = [ message, %{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}] match = [ path , %{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log] break_on_match = false } #To check location is S or L if [loccode] == S or [loccode] == L { ruby { code = temp = event['_machine'].split('_') if !temp.nil? || !temp.empty? event['_machine'] = temp[0] end } } mutate { add_field = [event_timestamp, %{@timestamp} ] replace = [ log_time, %{logdate} %{log_time} ] # Remove the 'logdate' field since we don't need it anymore. lowercase=[loccode] remove = logdate } # to get all site details (site name, city and co-ordinates) sitelocator{sitename = loccode datafile=vendor/sitelocator/SiteDetails.csv} date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } output { elasticsearch{ } } I have checked step by step to find bottleneck filter. Below filter which took much time. Can you guide me How can I tune it to get faster. date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558 Thanks Devaraj -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-bB8%3DY0fd4HKcJ9Tw6OENwOTkMYo2muZs-Pd7-dt%2BA9w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Help with 4 node cluster
You can have 4 master eligable if you want , just set discovery.zen.minimum_master_nodes to 3 to ensure a quorum. But ideally as Christian mentioned, it's best to have an uneven number of masters. On 19 February 2015 at 16:51, christian.dahlqv...@elasticsearch.com wrote: Hi, You always want an odd number of master nodes (often 3), so I would therefore recommend setting three of the four nodes to be master eligible and leave the fourth as a pure data node. This will prevent the cluster getting partitioned into two with equal number of master nodes on both sides of the partition. Best regards, Christian On Wednesday, February 18, 2015 at 11:39:54 AM UTC, sysads wrote: Hi I am in need of help on setting up a 4 node elasticsearch servers. I have installed and configured ES on all 4 nodes but I am lost as to what the configuration in elasticsearch.yml will be: - if I want to have all 4 nodes both master and data - make node A act as primary shard while node B acts as its replica then node C as primary shard while node D as its own replica. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/956bd6e5-44fa-448e-93aa-9a7623972b42%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/956bd6e5-44fa-448e-93aa-9a7623972b42%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-hffk5m_ORJ9_LdmBqgcK2QLMCvwpuQhVJRzRvM1A8tg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Disk awarnes on Indexing.
Hi All, Few of our nodes filled disks up to 90%. Is there any option to avoid indexing on those disks instead of relocating shards. Because each shard contains roughly 600G data, so relocating is costly as ours is very busy cluster. Thanks Prasath Rajan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0eeb39f-de10-4db1-9db7-838c8e4d21c3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
indexing- pls help
Hi.. I am a beginner here.. I just installed elasticsearch on windows and i added the sense extension for google chrome.. I need to index some files that are present in my local system directory.. I did not install any extension other than sense.. Please tell me what more should i install and how do i add and index these files,, Thanks in advance Aiswaryalakshmi K -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3620d76-5918-41e4-8eb6-fe2baadc3e59%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Combining Multiple Queries with 'OR' or 'AND'
Hi, bool query should work: - { query: { bool: { should: [ { filtered: { // Query1 query: {...}, filter: {...} } }, { filtered: { // Query2 query: {...}, filter: {...} } } ] } } } - Masaru On Fri, Feb 20, 2015 at 8:29 AM, Debashish Paul shima...@gmail.com wrote: Hi, Question I am trying to combine 2 user search queries using AND and OR as operation. I am aware of combining queries where we can merge filters, but I want to merge entire queries like { {BIG Elastic Query1} AND {BIG Elastic Query2} }. Details For instance say a user performs a search for batman in movies type with filters of Christian and Bale and another query Dark Knight in type tvshows with filters of Christopher Nolan. I want to combine both queries so I can look for both batman movies and Dark Knight tvshows, but not Dark knight movies or batman tvshows. For example, for the given queries I just want to run Query1 OR Query2 in the elasticsearch. Query 1: { query: { filtered: { query: { query_string:{ query:Batman, default_operator:AND, fields:[ Movies._all ] } }, filter: { bool: { must: [ { query:{ filtered:{ filter:{ and:[ { term:{ cast.firstName:Christian } }, { term:{ cast.lastName:Bale } } ] } } } } ] } } } } } Query2: { query: { filtered: { query: { query_string:{ query:Dark Knight, default_operator:AND, fields:[ tvshows._all ] } }, filter: { bool: { must: [ { query:{ filtered:{ filter:{ and:[ { term:{ director.firstName:Christopher } }, { term:{ director.lastName:Nolan } } ] } } } } ] } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be301214-1b6a-4065-8c8e-f1e5048e9873%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/be301214-1b6a-4065-8c8e-f1e5048e9873%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1GF4sQX_Pa7RRL2xOFz0gXs2r5-dtx55AsToxCgWtKcQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: indexing- pls help
You could give a look at FSRiver plugin. Might help to start. https://github.com/dadoonet/fsriver David Le 20 févr. 2015 à 06:42, aiswarya lakshmi ais.k...@gmail.com a écrit : Hi.. I am a beginner here.. I just installed elasticsearch on windows and i added the sense extension for google chrome.. I need to index some files that are present in my local system directory.. I did not install any extension other than sense.. Please tell me what more should i install and how do i add and index these files,, Thanks in advance Aiswaryalakshmi K -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3620d76-5918-41e4-8eb6-fe2baadc3e59%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/D5F364C7-F6D8-47BC-9BF7-45D1CB5B5372%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch performance tuning
I listed below instance and his heap size details. Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network Java heap size: 2gb R3 Large 15.25 RAM 2 cores Storage :32 GB SSD Java heap size: 7gb R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Java heap size: 15gb Thanks Devaraj On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote: Don't change cache and buffer sizes unless you know what is happening, the defaults are going to be fine. How much heap did you give ES? I'm not sure you can do much about the date filter though, maybe someone else has pointers. On 19 February 2015 at 21:12, Deva Raj devara...@gmail.com javascript: wrote: Hi Mark Walkom, I have given below logstash conf file Logstash conf input { file { } } filter { mutate { gsub = [message, \n, ] } mutate { gsub = [message, \t, ] } multiline { pattern = ^ what = previous } grok { match = [ message, %{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}] match = [ path , %{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log] break_on_match = false } #To check location is S or L if [loccode] == S or [loccode] == L { ruby { code = temp = event['_machine'].split('_') if !temp.nil? || !temp.empty? event['_machine'] = temp[0] end } } mutate { add_field = [event_timestamp, %{@timestamp} ] replace = [ log_time, %{logdate} %{log_time} ] # Remove the 'logdate' field since we don't need it anymore. lowercase=[loccode] remove = logdate } # to get all site details (site name, city and co-ordinates) sitelocator{sitename = loccode datafile=vendor/sitelocator/SiteDetails.csv} date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } output { elasticsearch{ } } I have checked step by step to find bottleneck filter. Below filter which took much time. Can you guide me How can I tune it to get faster. date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558 Thanks Devaraj -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Script merge the results of two aggregations
Hi, Looks like you are using lucene expression [1]. See the link for the limitation of lucene expression. Today it only supports numeric values. Since terms agg doesn’t have lang property, probably you have “script.default_lang set to “expression in elasticsearch.yml? FYI, if you put “lang”:”groovy” (and if configuration allows running dynamic groovy script), your query should work. But make sure you read release note [2] before turning on dynamic groovy scripting. (you can use groovy script without turning on dynamic scripting [3]) Masaru [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts [2] http://www.elasticsearch.org/blog/elasticsearch-1-4-3-and-1-3-8-released/ [3] http://www.elasticsearch.org/blog/running-groovy-scripts-without-dynamic-scripting/ On February 19, 2015 at 22:21:41, ali balci (balci.a...@gmail.com) wrote: I second error : { error: SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[g][test][0]: RemoteTransportException[[Mammomax][inet[/192.168.1.8:9300]][search/phase/query+fetch]]; nested: SearchParseException[[test][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value+'|'+doc['brandLink'].value\,\size\:0]]]; nested: ExpressionScriptCompilationException[Failed to parse expression: doc['brandName'].value+'|'+doc['brandLink'].value]; nested: ParseException[ unexpected character ''' at position (23).]; nested: NoViableAltException; }], status: 400 } 2015-02-19 14:49 GMT+02:00 ali balci : I use aggregrations on elasticsearch version 1.3.8. I use aggregation script for awhile today ıt didnt work. Please help ı cant find any solution : This the mapping: mappings: { product: { properties: { brandId: { type: integer }, brandIsActive: { type: boolean }, brandLink: { type: string, index: not_analyzed }, brandName: { type: string, index: not_analyzed } } } } this my query: post alias-test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value, size: 0 } } } } This is the error: { error: SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[g][mizu-20150219142655][0]: RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]]; nested: SearchParseException[[mizu-20150219142655][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value\]]]; nested: ExpressionScriptCompilationException[Field [brandName] used in expression must be numeric]; }], status: 400 } The other query : post test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value+'|'+doc['brandLink'].value, size: 0 } } } } the error : post test/product/_search { query: { match_all: {} }, aggs: { Brand: { terms: { script: doc['brandName'].value+'|'+doc['brandLink'].value, size: 0 } } } } -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/aeNWfgYNVmA/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com . For more options, visit https://groups.google.com/d/optout. -- Best Regards ALİ BALCI Bilgisayar Mühendisligi Tel:0543 699 59 88 FACEBOOK BLOG -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMLQj%3D%2B1k0DiPwXdtxYNMjcK%2BMeQmzSAdw-9D0nrVpP-mPU4nw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54e6d79d.3a95f874.10ad%40citra.local. For more options, visit https://groups.google.com/d/optout.
Re: Move Data from One index to Another Cluster Index
The python client has a reindex helper that can do just that - just supply a client instance for the source and destination clusters. http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.reindex Hope this helps, Honza On Thu, Feb 19, 2015 at 11:44 PM, Amay Patil amaypati...@gmail.com wrote: How can one move data from One index in a server to another Index in another Server. I know reindexing moves data into another index in the same cluster. How can it be done in Python ? Ap -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecb23365-08a3-4b0e-8036-ed2f8b1fbc27%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDiopGtorHza49wP1MLPrnA4MD%3Dufd2KO0K6eyamE0AuUfQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: What tools to use for indexing ES in production?
Kafka is fine, you could also use Logstash to read this data and send to ES - http://www.elasticsearch.org/guide/en/logstash/current/plugins-inputs-kafka.html NB this is in the 1.5 BETA release, so use with caution. Otherwise take a look at other Logstash inputs and see if there is something suitable you can leverage, there is also a number of official clients if you want to roll your own. There isn't an official method, but those last two are pretty common. On 19 February 2015 at 20:07, Kevin Liu ke...@ticketfly.com wrote: Well, the sales messages are coming through kafka. We need to extract some info from the database. We can do anything really. I'm just not sure what are the common practice here. It seems to be so many options. what kind of questions am I not asking here? On Monday, February 16, 2015 at 2:30:45 AM UTC-8, Mark Walkom wrote: This depends on how the app is made and what options you have to extract data from it. On 16 February 2015 at 20:28, Kevin Liu ke...@ticketfly.com wrote: We want to index our sales records as they come in from our apps. We are using quartz job right now, which is really slow and not really real-time. We will be implementing a message bus soon for firing sales event. The process is: read from a message queue grab some extra data from mysql do some ETL to construct the document index it to ES. I've been reading about apache spark, ES river, logstash. My questions is what kind of tools are right for the job here? is apache spark an over kill? is DIY a better option here? what are you guys using? please advice and point me to the right thing to read. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a11083b-4e9f-4be2-9a8e-83caf34d56b4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4a11083b-4e9f-4be2-9a8e-83caf34d56b4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8r2as%3DXJRfYa1cTV3ZGsV_ckfjW-ZxqQCF1iDa9mz-mA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: I want to implement QueryElevationComponent feature in ELasticSearch
I have implemented query elevation by a function score-based conditional boosting plugin, see https://github.com/jprante/elasticsearch-functionscore-conditionalboost Jörg On Thu, Feb 19, 2015 at 2:40 PM, Gurunath pai pai.gurun...@gmail.com wrote: HI All, I want to implement QueryElevationComponent feature in ELasticSearch, Can anyone suggest me how can I go ahead with this feature in Elastic Search. Thanks Guru Pai. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/219ad580-8e19-418b-a775-79f61373141c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/219ad580-8e19-418b-a775-79f61373141c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEied9-E3j%2BEgdY-%3DJzdck%3DyE8xb3QgGjrpgETxQF4xew%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
formula or guidelines to calculate/estimate the index size
Could anyone assist me to know how can I find a formula or guidelines to calculate/estimate the index size created by elastic search. I found that there is a formula (excel sheet) for it for LUCENE. Thanks! Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3Knr%3DusYtW8KyJVqbNsS0cK5z1MY5F-EvP94tC1%3DjBCDA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: [Hadoop][Spark] Exclude metadata fields from _source
Thanks for the response Costin! As you mentioned, option 1, i.e es.mapping.exclude, is more appropriate when working with JSON. Since it doesn't seem to work, I've followed your advice and raised a new issue (https://github.com/elasticsearch/elasticsearch-hadoop/issues/381) including a small test application to reproduce. I'd be happy to hear what you think of it. Thanks again, Itai On Wednesday, February 18, 2015 at 7:42:36 PM UTC+2, Costin Leau wrote: Hi Itay, Sorry I missed your email. I'm not clear from your post how your documents look like - can you post a gist somewhere with your JSON input that you are sending to Elasticsearch? Typically the metadata appear in the _source if they are declared that way. You should be able to go around this by using: 1. es.mapping.exclude - if it doesn't seem to be working 2. in case of Spark, by specifying the metadata through the `saveWithMeta` methods which allows it to stay decoupled from the object itself. Since you are using JSON likely 1 is your best shot. If it doesn't work for you can you please raise an issue with a quick/small sample to be able to reproduce it? Thanks, On Wed, Feb 18, 2015 at 10:27 AM, Itai Yaffe it...@exelate.com javascript: wrote: Hey, Has anyone experienced with such an issue? Perhaps Costin can help here? Thanks! On Thursday, February 12, 2015 at 8:27:14 AM UTC+2, Itai Yaffe wrote: Hey, I've recently started using Elasticsearch for Spark (Scala application). I've added elasticsearch-spark_2.10 version 2.1.0.BUILD-SNAPSHOT to my Spark application pom file, and used org.apache.spark.rdd.RDD[String].saveJsonToEs() to send documents to Elasticsearch. When the documents are loaded to Elasticsearch, my metadata fields (e.g id, index, etc.) are being loaded as part of the _source field. Is there a way to exclude them from the _source? I've tried using the new es.mapping.exclude configuration property (added in this commit https://github.com/elasticsearch/elasticsearch-hadoop/commit/aae4f0460a23bac9567ea2ad335c74245a1ba069 - that's why I needed to take the latest build rather than using version 2.1.0.Beta3), but it doesn't seem to have any affect (although I'm not sure it's even possible to exclude fields I'm using for mapping, e.g es.mapping.id). A code snippet (I'm using a single-node Elasticsearch cluster for testing purposes and running the Spark app from my desktop) : val conf = new SparkConf()... conf.set(es.index.auto.create, false) conf.set(es.nodes.discovery, false) conf.set(es.nodes, XXX:9200) conf.set(es.update.script, XXX) conf.set(es.update.script.params, param1:events) conf.set(es.update.retry.on.conflict , 2) conf.set(es.write.operation, upsert) conf.set(es.input.json, true) val documentsRdd = ... documentsRdd.saveJsonToEs(test/user, scala.collection.Map(es. mapping.id - _id, es.mapping.exclude - _id)) The JSON looks like that : { _id: , _type: user, _index: test, params: { events: [ { ... } ] } Thanks! } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aea88dfb-8d4b-49d1-a236-8de6d513b4f6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/aea88dfb-8d4b-49d1-a236-8de6d513b4f6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f210b41-4a31-4dd4-aa2d-cae7aabd3a1f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elastic search cluster behind azure public load balncer
Do not run ES across regions, it's a bad idea. Use snapshot and restore or some other method to replicate data. On 19 February 2015 at 20:51, Subodh Patil subodh00...@gmail.com wrote: Ok Thanks Mark. So you are saying I should set node.master: true as well as node.data: true for all the 3 nodes in the cluster. And this will hold true for 4 node or even more node cluster ? As of now I am making data redundant on all 3 nodes but please suggest performance... azure Load balancer chances of failure are much lesser than VM. The redundancy is required from backup point of view. So I will have 4 node cluster with data replicated on all 4 nodes. 3 nodes in the same region (say East US) and behind load balancer and 4th node will be in different region (say west us) but NOT behind load balancer and just holding the all data. In case east us data center has problem then I can redirect all traffic to west US data center where single node will always have all update data, or even i can take backups from west us data center. Thanks, Subodh On Thursday, February 19, 2015 at 7:10:52 AM UTC+5:30, Mark Walkom wrote: Yes the master can serve requests. You don't really want 2 masters and 1 data node though, make all 3 master+data to start with. And sure the client can be a SPOF, but then isn't a single load balancer a SPOF as well? So the question remains, where are you happy dealing with these points, because at some point you cannot make *everything* redundant without being excessive. On 18 February 2015 at 21:36, Subodh Patil subod...@gmail.com wrote: i am trying to setup ES cluster behind azure load balancer /cloud service. the cluster is 3 node with no specific data/client/master node settings. By default 2 nodes are elected as master and 1 as data node. As the request (create/update/search) from application comes to azure load balancer on 9200 port which load balanced for all 3 vms the request can go to any vm. Will master node be able to serve the requests ? Many article says that you don't need load balancer for ES cluster just use client node but then it becomes single point of failure as azure vm can go down any point of time. so load balancing is required mainly for high availability from infrastructure point of view. please suggest cluster setup and which nodes (data or client) to be put behind load balancer. ES version 1.4.1 on windows server 2012 r2 vm -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d961148e-13d5-444f-bb33-368c9a468078%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d961148e-13d5-444f-bb33-368c9a468078%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8r5s%2BavwSCdf1qkA9Yh1mU%3DFKEh7vo7yO-jOUG0MRoTw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch Broken?
Hello, I recently started attempting to use elasticsearch to store some data. In the learning process, I was doing ok until I attempted using the bulk api. I used the something along the lines of: conn = ES(url, timeout, bulksize) for each (tuple) data = something(tuple) conn.index(data, index name, count, bulk=true) Which I imagined would add a large number of items to localhost:9200/(index name)/(type name)/(count), but it ended up adding a large number of garbage at localhost:9200/(index name). Not sure how to proceed, but knowing that I needed a reset, I went ahead and did a delete on localhost:9200/*, which deleted everything as expected. I thus attempted to begin again, with adding a index with mapping. curl -XPUT http://localhost:9200/(index name) -d mapping.json mapping.json: { settings: { analysis: { analyzer: { ngram_analyzer: { tokenizer: ngram_tokenizer } }, tokenizer: { ngram_tokenizer: { type: nGram, min_gram: 2, max_gram: 3, token_chars: [ letter, digit, symbol, punctuation, whitespace ] } } } }, mappings: { access: { properties: { date: { type: date, format: -MM-dd, analyzer: ngram_analyzer }, time: { type: date, format: HH:mm:ss, analyzer: ngram_analyzer }, protocol: { type: string, analyzer: ngram_analyzer }, source ip: { type: string, analyzer: ngram_analyzer }, source port: { type: integer }, country: { type: string, analyzer: ngram_analyzer }, organization: { type: string, analyzer: ngram_analyzer }, dest ip: { type: string, analyzer: ngram_analyzer }, dest port: { type: integer } } } } } This is where the troubles began. While the above command created a node at (index name), it refused to populate the index with my settings. { darknet : { aliases : { }, mappings : { }, settings : { index : { creation_date : 1424425712525, mapping : { json : }, uuid : cgTEPkqnQJKejLPyHqVNYA, number_of_replicas : 1, number_of_shards : 5, version : { created : 1040399 } } }, warmers : { } } } I attempted to update the node with curl -XPUT http://localhost:9200/(index name)/_setting -d mapping.json after a close, but that also left the index blank. I delete the index, create another index, etc. etc. but to no avail. In the end, I manage to make a change to the index, though not a good one. I made a call to the update mapping API, which changed the index into a horrifying monstrosity with repeating tags, like so: { darknet : { aliases : { }, mappings : { 1 : { properties : { mappings : { properties : { access : { properties : { properties : { properties : { country : { properties : { analyzer : { type : string }, type : { type : string } } }, date : { properties : { analyzer : { type : string }, format : { type : string }, type : { type : string } } }, dest ip : { properties : { analyzer : { type : string }, type : { type : string } } }, dest port : { properties : { type : { type : string } } }, organization : { properties : { analyzer : { type : string
Incorrect results from geo_shape filter
Hey Everyone, I have a nested document that includes a spatial component that I'm using in a spatial query via the geo_shape filter. I've noticed that the filter consistently returns a result that includes a geometry that clearly falls outside the spatial filter. I've tested the spatial intersection in PostGIS and that correctly returns back no result from the same query. Can anyone else verify this? We have lots of other MultiPolygon data that seems to work just fine. I'm using the REST api against ES v1.4.1 *Here's the document I'm testing against:* {dates: [], geometries: [{ value: { type: MultiPolygon, coordinates: [ [ [ [-118.32608, 34.07035], [-118.32657, 34.07035], [-118.32657, 34.07054], [-118.32608, 34.07054], [-118.32608, 34.07035] ] ], [ [ [-118.32608, 34.07021], [-118.32608, 34.07004], [-118.32657, 34.07004], [-118.32657, 34.07021], [-118.32608, 34.07021] ] ] ] }, label: MULTIPOLYGON (((-118.32608 34.07035,-118.32657 34.07035,-118.32657 34.07054,-118.32608 34.07054,-118.32608 34.07035)),((-118.32608 34.07021,-118.32608 34.07004,-118.32657 34.07004,-118.32657 34.07021,-118.32608 34.07021))), child_entities: [], entitytypeid: SPATIAL_COORDINATES_GEOMETRY.E47, parentid: 80446382-0e96-4db1-8a37-95aca166b785, entityid: 81b06621-2989-4825-98f3-88ee2ac937b6, property: P87, businesstablename: geometries }], child_entities: [{ value: Windsor Square District Non-Contributor, label: Windsor Square District Non-Contributor, child_entities: [], entitytypeid: NAME.E41, parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c, entityid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb, property: P1, businesstablename: strings }, { value: , label: , child_entities: [], entitytypeid: PLACE.E53, parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c, entityid: 80446382-0e96-4db1-8a37-95aca166b785, property: P53, businesstablename: }], label: , date_groups: [], primaryname: Windsor Square District Non-Contributor, value: , entitytypeid: HERITAGE_RESOURCE.E18, domains: [{ conceptid: a5675b84-fed4-4839-9afa-434be64c3899, child_entities: [], label: Primary, value: b171b37b-4c78-4b51-91a9-503541511be4, entitytypeid: NAME_TYPE.E55, parentid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb, entityid: 4bd86b01-a047-4e7a-953a-00d2243e39db, property: P2, businesstablename: domains }], entityid: 5fbe787f-8fc3-4986-b77a-38828699b68c, property: , businesstablename: } *Here's the DSL:* { query: { filtered: { filter: { and: [ { bool: { should: [], must_not: [], must: [ { nested: { path: geometries, query: { geo_shape: { geometries.value: { shape: { type: Point, coordinates: [ -118.34194465228431, 34.06402964781424 ] } } } } } } ] } } ] }, query: { match_all: {} } } }, from: 0, size: 5 } *Here's the mapping for the index:* properties: { businesstablename: { index: not_analyzed, type: string }, child_entities: { type: nested, properties: { businesstablename: { index: not_analyzed, type: string }, entitytypeid: { index: not_analyzed, type: string }, property: { index: not_analyzed, type: string }, entityid: { index: not_analyzed, type: string }, label: { index: not_analyzed, type: string }, value: { type: string, fields: { raw: { index: not_analyzed, type: string }, folded: { analyzer: folding, type: string } } }, parentid: { index: not_analyzed, type: string } } }, entitytypeid: { index: not_analyzed, type: string }, domains: { type: nested, properties: { businesstablename: { index: not_analyzed, type: string }, entitytypeid: { index: not_analyzed, type: string }, property: { index: not_analyzed, type: string
Re: Incorrect results from geo_shape filter
FYI, you can also try a search using these coordinates (which are clearly incorrect) in your filter, and it still returns a result. coordinates: [ -119, 35 ] On Thursday, February 19, 2015 at 3:04:29 PM UTC-8, Alexei Peters wrote: Hey Everyone, I have a nested document that includes a spatial component that I'm using in a spatial query via the geo_shape filter. I've noticed that the filter consistently returns a result that includes a geometry that clearly falls outside the spatial filter. I've tested the spatial intersection in PostGIS and that correctly returns back no result from the same query. Can anyone else verify this? We have lots of other MultiPolygon data that seems to work just fine. I'm using the REST api against ES v1.4.1 *Here's the document I'm testing against:* {dates: [], geometries: [{ value: { type: MultiPolygon, coordinates: [ [ [ [-118.32608, 34.07035], [-118.32657, 34.07035], [-118.32657, 34.07054], [-118.32608, 34.07054], [-118.32608, 34.07035] ] ], [ [ [-118.32608, 34.07021], [-118.32608, 34.07004], [-118.32657, 34.07004], [-118.32657, 34.07021], [-118.32608, 34.07021] ] ] ] }, label: MULTIPOLYGON (((-118.32608 34.07035,-118.32657 34.07035,-118.32657 34.07054,-118.32608 34.07054,-118.32608 34.07035)),((-118.32608 34.07021,-118.32608 34.07004,-118.32657 34.07004,-118.32657 34.07021,-118.32608 34.07021))), child_entities: [], entitytypeid: SPATIAL_COORDINATES_GEOMETRY.E47, parentid: 80446382-0e96-4db1-8a37-95aca166b785, entityid: 81b06621-2989-4825-98f3-88ee2ac937b6, property: P87, businesstablename: geometries }], child_entities: [{ value: Windsor Square District Non-Contributor, label: Windsor Square District Non-Contributor, child_entities: [], entitytypeid: NAME.E41, parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c, entityid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb, property: P1, businesstablename: strings }, { value: , label: , child_entities: [], entitytypeid: PLACE.E53, parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c, entityid: 80446382-0e96-4db1-8a37-95aca166b785, property: P53, businesstablename: }], label: , date_groups: [], primaryname: Windsor Square District Non-Contributor, value: , entitytypeid: HERITAGE_RESOURCE.E18, domains: [{ conceptid: a5675b84-fed4-4839-9afa-434be64c3899, child_entities: [], label: Primary, value: b171b37b-4c78-4b51-91a9-503541511be4, entitytypeid: NAME_TYPE.E55, parentid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb, entityid: 4bd86b01-a047-4e7a-953a-00d2243e39db, property: P2, businesstablename: domains }], entityid: 5fbe787f-8fc3-4986-b77a-38828699b68c, property: , businesstablename: } *Here's the DSL:* { query: { filtered: { filter: { and: [ { bool: { should: [], must_not: [], must: [ { nested: { path: geometries, query: { geo_shape: { geometries.value: { shape: { type: Point, coordinates: [ -118.34194465228431, 34.06402964781424 ] } } } } } } ] } } ] }, query: { match_all: {} } } }, from: 0, size: 5 } *Here's the mapping for the index:* properties: { businesstablename: { index: not_analyzed, type: string }, child_entities: { type: nested, properties: { businesstablename: { index: not_analyzed, type: string }, entitytypeid: { index: not_analyzed, type: string }, property: { index: not_analyzed, type: string }, entityid: { index: not_analyzed, type: string }, label: { index: not_analyzed, type: string }, value: { type: string, fields: { raw: { index: not_analyzed, type: string }, folded: { analyzer: folding, type: string } } }, parentid: { index: not_analyzed,
Java node client failing to send join request to master
Anyone have any idea where to start with this one? We're running our singular master/data node in a docker container, trying to connect via a Java node client on different boxes. Please help!! INFO [2015-02-20 02:55:11,411] org.elasticsearch.discovery.ec2: [localhost] failed to send join request to master [[esNode][jU4Y42dtQfCPJn1-5jFfYw][esHost][inet[/255.255.255.255]]{master=true}], reason [RemoteTransportException[[esNode][inet[/255.255.255.255:9300]][internal:discovery/zen/join]]; nested: NotSerializableTransportException[[org.elasticsearch.transport.ConnectTransportException] [localhost][inet[/255.255.255.255:9300]] connect_timeout[30s]; connection timed out: /255.255.255.255:9300; ]; ] elasticsearch.yml: cloud: aws: access_key: awsAccessKey secret_key: awsSecretKey region: us-west-2 discovery: type: ec2 ec2: groups: elasticsearch availability_zones: us-west-2a tag: Elasticsearch: tag network.public_host: 255.255.255.255 network.publish_host: 255.255.255.255 discovery.zen.ping.multicast.enabled: false Java node client: Settings settings = ImmutableSettings.settingsBuilder() .put(node.name, nodeName) .put(cloud.aws.access_key, awsAccessKey) .put(cloud.aws.secret_key, awsSecretKey) .put(cloud.aws.region, us-west-2) .put(cloud.node.auto_attributes, true) .put(discovery.type, ec2) .put(discovery.ec2.groups, elasticsearch) .put(discovery.ec2.availability_zones, us-west-2a) .put(discovery.ec2.tag.Elasticsearch, devvpc) .put(discovery.zen.ping.multicast.enabled, false) .build(); this.node = nodeBuilder() .clusterName(clusterName) .settings(settings) .client(true) .node(); this.client = node.client(); -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c56baa18-0956-4b88-a063-acd4db82374e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Combining Multiple Queries with 'OR' or 'AND'
Hi, Question I am trying to combine 2 user search queries using AND and OR as operation. I am aware of combining queries where we can merge filters, but I want to merge entire queries like { {BIG Elastic Query1} AND {BIG Elastic Query2} }. Details For instance say a user performs a search for batman in movies type with filters of Christian and Bale and another query Dark Knight in type tvshows with filters of Christopher Nolan. I want to combine both queries so I can look for both batman movies and Dark Knight tvshows, but not Dark knight movies or batman tvshows. For example, for the given queries I just want to run Query1 OR Query2 in the elasticsearch. Query 1: { query: { filtered: { query: { query_string:{ query:Batman, default_operator:AND, fields:[ Movies._all ] } }, filter: { bool: { must: [ { query:{ filtered:{ filter:{ and:[ { term:{ cast.firstName:Christian } }, { term:{ cast.lastName:Bale } } ] } } } } ] } } } } } Query2: { query: { filtered: { query: { query_string:{ query:Dark Knight, default_operator:AND, fields:[ tvshows._all ] } }, filter: { bool: { must: [ { query:{ filtered:{ filter:{ and:[ { term:{ director.firstName:Christopher } }, { term:{ director.lastName:Nolan } } ] } } } } ] } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be301214-1b6a-4065-8c8e-f1e5048e9873%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Search multiple indices in Kibana 4?
Hi all, Trying out Kibana 4's new release today, and was wondering is this is still possible. In Kibana 3, you could simply comma-delimit all the index patterns you wanted to expose to your searches, but that doesn't seem possible in Kibana 4. I have indexes named like: company-customer-MMDD I'd like to be able to search across company and customer if possible. It appears that the Configure an Index Pattern page allows for a single pattern, not multiples separated by commas. If I wanted to search these two indexes: companyA-customerA-MMDD companyA-customerB-MMDD but not: companyA-customerC-MMDD Is that possible? Thanks! Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DpijG_VaE91rFLnT-E0rGXL-kVq46yUVm4ixBTygi8wGTQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.