When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ???? How Comes !?!?
Any ideas? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52e09e54-90b6-4014-8454-34e3db5756e5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Just initialize shards when problems but no rebalance
Great, thank you. We are creating another cluster with more disk space to avoid this situations. By any chance do you have the link to the issue? 2015-01-15 13:26 GMT-03:00 Kimbro Staken ksta...@kstaken.com: I've experienced what you're describing. I called it a shard relocation storm and it's really tough to get under control. I opened a ticket on the issue and a fix was supposedly included in 1.4.2. What version are you running? If you want to truly manually manage this situation you could set cluster.routing.allocation.disk.threshold_enabled to false but that will likely cause other issues. I ended up just setting cluster.routing.allocation.disk.watermark.high to a really low value and actively managed shard allocations to prevent nodes from getting anywhere near that value. This is tricky as the way ES allocates shards it can easily run nodes out of disk if you're regularly creating new indices and those grow rapidly. Kimbro On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com wrote: Yes, I've seen that but the problem is that when the threshold is reached it removes all shards from the server instead of just removing 1 and balance. And when that happens the cluster starts to move shards over everywhere and it never stops. Another problem we are having is that in the file storage we see data from shards that are not assigned to itself so it can´t allocate anything in this dirty state. 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com: You could do this, but it's a lot of manual overhead to have to deal with. However ES does have some disk space awareness during allocation, take a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com wrote: Hi is there any setting that I can put to ES that it automatically assigns shards that are unassigned but never ever rebalance the cluster? I´ve found several issues when rebalancing and prefer to do it manually. If I set cluster.routing.allocation.enable to none nothing happens. If I set it to all then it starts rebalancing. Is it ok to combine cluster.routing.allocation.allow_rebalance to none and cluster.routing.allocation.enable to all. The issue is mainly because we are running low on disk and when that happens elasticsearch removes all shards from an instance, that doesn´t care about cluster.routing.allocation.cluster_concurrent_rebalance and starts moving shards like crazy around the entire cluster, filling the storage on other instances in the way that it will never stop balancing. Kind regards -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Help creating a near real time streaming plugin to perform replication between clusters
Hey all, I would like to create a plugin, and I need a hand. Below are the requirements I have. - Our documents are immutable. They are only ever created or deleted, updates do not apply. - We want mirrors of our ES cluster in multiple AWS regions. This way if the WAN between regions is severed for any reason, we do not suffer an outage, just a delay in consistency. - As documents are added or removed they are rolled up then shipped in batch to the other AWS Regions. This can be a fast as a few milliseconds, or as slow as minutes, and will be user configurable. Note that a full backup+load is too slow, this is more of a near realtime operation. - This will sync the following operations. - Index creation/deletion - Alias creation/deletion - Document creation/deletion What I'm thinking architecturally. - The plugin is installed on each node in our cluster in all regions - The plugin will only gather changes for the primary shards on the local node - After the timeout elapses, the plugin will ship the changelog to the other AWS regions, where the plugin will receive it and process it Are there any api's I can look at that are a good starting point for developing this? I'd like to do a simple prototype with 2 1 node clusters reasonably soon. I found several plugin tutorials, but I'm more concerned with what part of the ES api I can call to receive events, if any. Thanks, Todd -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Filter index to last 24h (REST)
A range filter on a date field with something like from now/d-1 to now/d+1 might work I think. If you don’t have a date field (could be a _timestamp field if you activated it), I’m afraid you can’t do that. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 15 janv. 2015 à 18:15, Matthew acernu...@gmail.com a écrit : Hi all, Is there any way to only load the last 24 hours of indices? I am trying to apply a query to only show the number of documents created over the last 24 hours (over the REST API), but I have not had too much luck. Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07576351-5bea-4f99-af51-16ff76791914%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/07576351-5bea-4f99-af51-16ff76791914%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0882E33E-371F-4B54-BAF9-CD0BABBD7E6F%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Filter index to last 24h (REST)
Hi all, Is there any way to only load the last 24 hours of indices? I am trying to apply a query to only show the number of documents created over the last 24 hours (over the REST API), but I have not had too much luck. Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07576351-5bea-4f99-af51-16ff76791914%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation - Blank and date aggregation
Then it means that you want to use a date_histogram aggregation with interval=day. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html On Thu, Jan 15, 2015 at 4:43 PM, buddarapu nagaraju budda08n...@gmail.com wrote: Hey Adrien ,Thank you.I have one more question on aggregating on dates . We actually stored date time in a field called createdDateTime but I need only aggregates on date part of date time . Any ideas ? Or sample code can help us ? Regards Nagaraju 908 517 6981 On Wed, Jan 14, 2015 at 6:10 AM, Adrien Grand adrien.gr...@elasticsearch.com wrote: On Wed, Jan 14, 2015 at 10:37 AM, buddarapu nagaraju budda08n...@gmail.com wrote: Does term aggregation counts on blank field values ? Yes, an empty value counts as a term. Note that you need the field to be not analyzed for it to work (or to use an analyzer that emits empty strings). Otherwise the standard analyzer would analyzer as an empty list of tokens, so a field value of would not actually count... Does term aggregation is enough for doing date aggregation ? Or there any specific aggregations we have ?All I need in date aggregation is to know different dates and its counts ? A terms aggregation is enough, but a date_histogram aggregation is generally more useful on dates as there are lots of unique values and it's often more useful to group them based on the year, month or day. -- Adrien Grand -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/i9N09n_-n38/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFtuXXKp0JycJfNvLxPGN_5YL7P-X%3DGDzvmYJQ9NFN7Q%2BaJjQw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAFtuXXKp0JycJfNvLxPGN_5YL7P-X%3DGDzvmYJQ9NFN7Q%2BaJjQw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Nn8h7C9BoW6PUjHbS%2Bnerpw3%3DWUi5RrC5ewtDBtSRaA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How to find all docs where field_a === val1 and field_b === val2?
Thanks! I was thinking a bool query was something specific to fields with boolean values. Which is why I didn't understand the bool query example in the docs. Your posts helped me get what I wanted. :) On Wednesday, January 14, 2015 at 3:34:05 PM UTC-8, Brian wrote: By the way, David, the full query follows: { from : 0, size : 20, timeout : 6, *query* : { *bool* : { *must* : [ { match : { field_a : { query : val1, type : boolean } } }, { match : { field_b : { query : val2, type : boolean } } } ] } }, version : true, explain : false, fields : [ _ttl, _source ] } Also note that since the _ttl field is being requested (always), then the _source must also be asked for explicitly. If you don't ask for any fields, _source is returned by default. But if you ask for one or more fields explicitly, then you must also ask for _source or it won't be returned. Brian On Wednesday, January 14, 2015 at 6:31:29 PM UTC-5, Brian wrote: David, This is what I use. I hope it helps. { *bool* : { *must* : [ { match : { field_a : { query : val1, type : boolean } } }, { match : { field_b : { query : val2, type : boolean } } } ] } } Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/feb306b6-aa38-4eaf-a9fc-ad23be10ea4a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How can I add additional parameters in aggregation?
I have documents with id and name and title. I am making aggregation according name, but how can I get in the results also the name and title? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9456488-805b-4c5e-ad16-12cd9a0feaf2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is ElasticSearch truly scalable for analytics?
Regarding the accuracy of top-k lists This is perhaps an over-simplification - we deal with far more complex scenarios than a simple, single top-K list - we have whole aggregation trees with multiple layers of aggs: geo, time, nested, parent/child, percentiles, cardinalities etc etc which can embed multiple top K terms aggs, or be contained by one. Today all aggs work in one pass over local data to produce a merge-able summary output - if you introduce the idea of pausing all of this local computation mid-stream and then resuming it once you've centrally determined what top K is across a cluster and for various points in the agg tree then coordinating all of these updates gets impossibly complex. I acknowledge it is a highly specialised use-case which not very many people run into, but it is a case I'm currently working on. To be fair multi-level merging is a capability which might also apply to analytics in federated architectures where proxy servers might act as the front to nodes in remote clusters. I was thinking to reduce the complete set of buckets locally I'm unclear on your approach to the reduce: 1) Take the summary outputs of multiple agg pipelines computed in parallel and merge them in the same way coordinating nodes do or 2) Take the raw inputs (doc streams) from all shards held on a node and feed them through a single aggregation pipeline to get one combined output The problems being 1) loses accuracy and 2) loses any parallelism because agg pipelines are single threaded and must process doc streams serially. Because you claimed accuracy would be better I guess you mean option 2? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5967eb30-5bd8-42b8-aa35-1793dc77afa7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second
Awesome! Great to know that. So as a conclusion the steps will be: 1) Stream tweets from twitter 2) Use the bulk API to make batches of 1000 (or more) tweets 3) Once the batch size is reached, spawn a new thread which will index the data into ES, meanwhile my original thread will continue streaming tweets Do these steps sound alright to you or did I miss something? On Thursday, January 15, 2015 at 7:58:19 PM UTC+5:30, David Pilato wrote: I can index on my laptop 1-12000 docs per second. SSD drives of course. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 13:43, Chinch Pokli cpo...@gmail.com javascript: a écrit : No, so the whole point was that, will elasticsearch be able to index say 10,000 documents per second? If yes, I can simply hook up my twitter code to es. If not, I would need to think of how to make that happen. Typically I've seen es indexes just around 30 docs per second which is pretty low. I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get some breathing room and enable it to index up to 10K docs per second. On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote: You have a Twitter input so you can extract content from Twitter and send to elasticsearch. No need to have Redis here. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com a écrit : Thanks. I'll have a look at the raw option. Regarding logstash, I don't fully understand it's utility. It says that it can take messages from a Redis server. But if I have to set up Redis, I could simply use the Redis river to index into Elasticsearch. Is there any additional benefit that Logstash would give me? On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote: You should look at raw option or better look at Logstash. My 2 cents. David Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit : Hi, I am using elasticsearch to index twitter stream. Until recently I was using the official river which was working great but realized that it throwing out much of the data (e.g. it is not storing number of followers etc. data). Is there a way to make the river to store all the data? If not, I am fine with writing a streaming code which will stream and index. But have a concern. How many documents can elasticsearch index per second? I might eventually need to index almost 10,000 documents (each document = 2 KB) per second (current requirement is of 100 documents per second). Is this even feasible? If yes, do I need to make any special modifications? Thanks-in-advance!! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/11bf4f30-d7f6-41ac-886a-c5281dac31bd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is ElasticSearch truly scalable for analytics?
I would be also very interested in node level shard results reduction but not for scalability but precision reasons. I would like to have an option for a node to do complete aggregations on its shards so the results are exact rather than approximate. There are many use cases when corpus of data is reltively small to fit one powerful node and exactness is a MUST. With 48 core servers and ssd drives such node can process good deal of data and produce exact results which is a must for traditional datamart-like apps. Having this option will allow for this class of apps to be built. And in myltinode setup it wull provide better precision too -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d3fb8f8d-4563-4e97-b0fd-3cc220f252bc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second
Sounds good. If you are using Java, you could also look at the river code. Note that you should use BulkProcessor class which is super handy. BTW I said 1/s but not for tweets. I have less fields (20) than Twitter (100). With more fields, I guess it would take more time. Though with better machines, it could work. I'd say that you need to test on the production cluster. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 15:40, Chinch Pokli cpo...@gmail.com a écrit : Awesome! Great to know that. So as a conclusion the steps will be: 1) Stream tweets from twitter 2) Use the bulk API to make batches of 1000 (or more) tweets 3) Once the batch size is reached, spawn a new thread which will index the data into ES, meanwhile my original thread will continue streaming tweets Do these steps sound alright to you or did I miss something? On Thursday, January 15, 2015 at 7:58:19 PM UTC+5:30, David Pilato wrote: I can index on my laptop 1-12000 docs per second. SSD drives of course. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 13:43, Chinch Pokli cpo...@gmail.com a écrit : No, so the whole point was that, will elasticsearch be able to index say 10,000 documents per second? If yes, I can simply hook up my twitter code to es. If not, I would need to think of how to make that happen. Typically I've seen es indexes just around 30 docs per second which is pretty low. I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get some breathing room and enable it to index up to 10K docs per second. On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote: You have a Twitter input so you can extract content from Twitter and send to elasticsearch. No need to have Redis here. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com a écrit : Thanks. I'll have a look at the raw option. Regarding logstash, I don't fully understand it's utility. It says that it can take messages from a Redis server. But if I have to set up Redis, I could simply use the Redis river to index into Elasticsearch. Is there any additional benefit that Logstash would give me? On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote: You should look at raw option or better look at Logstash. My 2 cents. David Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit : Hi, I am using elasticsearch to index twitter stream. Until recently I was using the official river which was working great but realized that it throwing out much of the data (e.g. it is not storing number of followers etc. data). Is there a way to make the river to store all the data? If not, I am fine with writing a streaming code which will stream and index. But have a concern. How many documents can elasticsearch index per second? I might eventually need to index almost 10,000 documents (each document = 2 KB) per second (current requirement is of 100 documents per second). Is this even feasible? If yes, do I need to make any special modifications? Thanks-in-advance!! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/11bf4f30-d7f6-41ac-886a-c5281dac31bd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: need help for search
No worries for your english. Sorry. I missed your gist. Based on your examples, it sounds like you are french. Are you aware of the french mailing list? https://groups.google.com/forum/?hl=frfromgroups#!forum/elasticsearch-fr https://groups.google.com/forum/?hl=frfromgroups#!forum/elasticsearch-fr It would help a lot if you can simplify with some sample data and small queries what you are trying to do what does not work. So suppress all analyzers as I guess here it’s not really your concern at this stage. Try with only two or 3 fields. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 15 janv. 2015 à 17:13, Thibaut Owczarz thib...@1001pharmacies.com a écrit : hi, in my structure send in my gist, my question is just that: i have a search field. no say what i type in this field. but i need 1 request like this. { query : { bool: { must: [ ], must_not: [ ], should: [ { term: { sku: $datasearch } }, { term: { internal_code: $datasearch } }, { match: { firstname: $datasearch } }, { match: { lastname: $datasearch } }, { match: { address: $datasearch } }, { match: { city: $datasearch } }, { match: { localized_description: $datasearch } }, { match: { localized_keywords: $datasearch } }, { match: { service.localized_label: $datasearch } }, { match: { medias.localized_label: $datasearch } }, { match: { services.localized_label: $datasearch } } ] } } }'; Exemple : - - if $datasearch=sku, i have directly 1 user with this sku - if $datasearch=firstname, i have directly a list of user who have this firstname - if $datasearch=keyword, i have list of user who have this keyword - i take term for sku or internal_code because i can't search whith partial of this. (if my sku = 1234, no could found result if i type 123) - And for finish, in my data i have user : [1 - charles martin who have localized_keywords=moto, licorne, cheval, course ] [2 - henry martin who have localized_keywords=pétanque, chevaux, basket, parieur] i want with my request have this 2 user if $datasearch = cheval. I hope to be me understand , I can have a bad English thanks Le jeudi 15 janvier 2015 16:17:08 UTC+1, David Pilato a écrit : Could you reproduce this with a full test case so we understand exactly What you are doing? May be simplify your test. See elasticsearch.org/help http://elasticsearch.org/help -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 16:01, Thibaut Owczarz thi...@1001pharmacies.com javascript: a écrit : i'm ok, but my data search no say if is sku or code_internal or other field. if i do that, it's ok { query: { bool: { must: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } } ], must_not: [], should: [ { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 }
Re: need help for search
Could you reproduce this with a full test case so we understand exactly What you are doing? May be simplify your test. See elasticsearch.org/help -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 16:01, Thibaut Owczarz thib...@1001pharmacies.com a écrit : i'm ok, but my data search no say if is sku or code_internal or other field. if i do that, it's ok { query: { bool: { must: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } } ], must_not: [], should: [ { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_keywords: 01b3ae496c0142f993cf131c607fe003 } }, { match: { service.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { medias.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { services.localized_label: 01b3ae496c0142f993cf131c607fe003 } } ] } } } but if now i search with internal_code { query: { bool: { must: [ { term: { sku: 3401598272746 } } ], must_not: [], should: [ { term: { internal_code: 3401598272746 } }, { match: { firstname: 3401598272746 } }, { match: { lastname: 3401598272746 } }, { match: { address: 3401598272746 } }, { match: { city: 3401598272746 } }, { match: { localized_description: 3401598272746 } }, { match: { localized_keywords: 3401598272746 } }, { match: { service.localized_label: 3401598272746 } }, { match: { medias.localized_label: 3401598272746 } }, { match: { services.localized_label: 3401598272746 } } ] } } } my request is bad Le jeudi 15 janvier 2015 15:49:56 UTC+1, David Pilato a écrit : I guess it's most likely because you added all your filters in should clause instead of must? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 15:36, Thibaut Owczarz thi...@1001pharmacies.com a écrit : i found my first error, no need user. because i search already in user. but why when i search a defined sku, no found only one ? curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{ query : { bool: { must: [ ], must_not: [ ], should: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } }, { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 } }, { match: {
Re: Aggregation - Blank and date aggregation
Hey Adrien ,Thank you.I have one more question on aggregating on dates . We actually stored date time in a field called createdDateTime but I need only aggregates on date part of date time . Any ideas ? Or sample code can help us ? Regards Nagaraju 908 517 6981 On Wed, Jan 14, 2015 at 6:10 AM, Adrien Grand adrien.gr...@elasticsearch.com wrote: On Wed, Jan 14, 2015 at 10:37 AM, buddarapu nagaraju budda08n...@gmail.com wrote: Does term aggregation counts on blank field values ? Yes, an empty value counts as a term. Note that you need the field to be not analyzed for it to work (or to use an analyzer that emits empty strings). Otherwise the standard analyzer would analyzer as an empty list of tokens, so a field value of would not actually count... Does term aggregation is enough for doing date aggregation ? Or there any specific aggregations we have ?All I need in date aggregation is to know different dates and its counts ? A terms aggregation is enough, but a date_histogram aggregation is generally more useful on dates as there are lots of unique values and it's often more useful to group them based on the year, month or day. -- Adrien Grand -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/i9N09n_-n38/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFtuXXKp0JycJfNvLxPGN_5YL7P-X%3DGDzvmYJQ9NFN7Q%2BaJjQw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ???? How Comes !?!?
This is because the score takes two factors into account: the document frequency and the edit distance. Quite likely in your case, even though Boss is closer than Bose, Bose has a much lower document frequency which helped it eventually get a better score. I guess we should have another rewrite method that would not take freqs into account (or somehow merge them) to avoid that issue. On Thu, Jan 15, 2015 at 4:06 PM, Eylon Steiner eylon.stei...@gmail.com wrote: Any ideas? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52e09e54-90b6-4014-8454-34e3db5756e5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/52e09e54-90b6-4014-8454-34e3db5756e5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7-7SbX_CVizbC%3DwCf9jyNSfkn4zy-GEqEj0sdBZGkRrg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example
Hi, I work on a complex workflow using Spark (Parsing, Cleaning, Machine Learning). At the end of the workflow I want to send aggregated results to elasticsearch so my portal could query data. There will be two types of processing: streaming and the possibility to relaunch workflow on all available data. Right now I use elasticsearch-hadoop and particularly the spark part to send document to elasticsearch with the saveJsonToEs(myindex, mytype) method. The target is to have an index by day using the proper template that we build. AFAIK you could not add consideration of a feature in a document to send it to the proper index in elasticsearch-hadoop. What is the proper way to implement this feature? Have a special step useing spark and bulk so that each executor send documents to the proper index considering the feature of each line? Is there something that I missed in elasticsearch-hadoop? Julien -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58b0e0e3-a297-4cf4-95bf-d3cf34546ea3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: need help for search
hi, in my structure send in my gist, my question is just that: i have a search field. no say what i type in this field. but i need 1 request like this. { query : { bool: { must: [ ], must_not: [ ], should: [ { term: { sku: $datasearch } }, { term: { internal_code: $datasearch } }, { match: { firstname: $datasearch } }, { match: { lastname: $datasearch } }, { match: { address: $datasearch } }, { match: { city: $datasearch } }, { match: { localized_description: $datasearch } }, { match: { localized_keywords: $datasearch } }, { match: { service.localized_label: $datasearch } }, { match: { medias.localized_label: $datasearch } }, { match: { services.localized_label: $datasearch } } ] } } }'; Exemple : - - if $datasearch=sku, i have directly 1 user with this sku - if $datasearch=firstname, i have directly a list of user who have this firstname - if $datasearch=keyword, i have list of user who have this keyword - i take term for sku or internal_code because i can't search whith partial of this. (if my sku = 1234, no could found result if i type 123) - And for finish, in my data i have user : [1 - charles martin who have localized_keywords=moto, licorne, cheval, course ] [2 - henry martin who have localized_keywords=pétanque, chevaux, basket, parieur] i want with my request have this 2 user if $datasearch = cheval. I hope to be me understand , I can have a bad English thanks Le jeudi 15 janvier 2015 16:17:08 UTC+1, David Pilato a écrit : Could you reproduce this with a full test case so we understand exactly What you are doing? May be simplify your test. See elasticsearch.org/help -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 16:01, Thibaut Owczarz thi...@1001pharmacies.com javascript: a écrit : i'm ok, but my data search no say if is sku or code_internal or other field. if i do that, it's ok { query: { bool: { must: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } } ], must_not: [], should: [ { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_keywords: 01b3ae496c0142f993cf131c607fe003 } }, { match: { service.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { medias.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { services.localized_label: 01b3ae496c0142f993cf131c607fe003 } } ] } } } but if now i search with internal_code { query: { bool: { must: [ { term: { sku: 3401598272746 } } ], must_not: [], should: [ { term: { internal_code: 3401598272746 } }, { match: { firstname: 3401598272746 }
Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example
My previous idea doesn't seem to work. Cannot send documents directly to _bulk only to index/type pattern On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote: Hi, I work on a complex workflow using Spark (Parsing, Cleaning, Machine Learning). At the end of the workflow I want to send aggregated results to elasticsearch so my portal could query data. There will be two types of processing: streaming and the possibility to relaunch workflow on all available data. Right now I use elasticsearch-hadoop and particularly the spark part to send document to elasticsearch with the saveJsonToEs(myindex, mytype) method. The target is to have an index by day using the proper template that we build. AFAIK you could not add consideration of a feature in a document to send it to the proper index in elasticsearch-hadoop. What is the proper way to implement this feature? Have a special step useing spark and bulk so that each executor send documents to the proper index considering the feature of each line? Is there something that I missed in elasticsearch-hadoop? Julien -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f01bc8d0-0c04-4c82-8ddf-dc301b06179c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Just initialize shards when problems but no rebalance
I've experienced what you're describing. I called it a shard relocation storm and it's really tough to get under control. I opened a ticket on the issue and a fix was supposedly included in 1.4.2. What version are you running? If you want to truly manually manage this situation you could set cluster.routing.allocation.disk.threshold_enabled to false but that will likely cause other issues. I ended up just setting cluster.routing.allocation.disk.watermark.high to a really low value and actively managed shard allocations to prevent nodes from getting anywhere near that value. This is tricky as the way ES allocates shards it can easily run nodes out of disk if you're regularly creating new indices and those grow rapidly. Kimbro On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com wrote: Yes, I've seen that but the problem is that when the threshold is reached it removes all shards from the server instead of just removing 1 and balance. And when that happens the cluster starts to move shards over everywhere and it never stops. Another problem we are having is that in the file storage we see data from shards that are not assigned to itself so it can´t allocate anything in this dirty state. 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com: You could do this, but it's a lot of manual overhead to have to deal with. However ES does have some disk space awareness during allocation, take a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com wrote: Hi is there any setting that I can put to ES that it automatically assigns shards that are unassigned but never ever rebalance the cluster? I´ve found several issues when rebalancing and prefer to do it manually. If I set cluster.routing.allocation.enable to none nothing happens. If I set it to all then it starts rebalancing. Is it ok to combine cluster.routing.allocation.allow_rebalance to none and cluster.routing.allocation.enable to all. The issue is mainly because we are running low on disk and when that happens elasticsearch removes all shards from an instance, that doesn´t care about cluster.routing.allocation.cluster_concurrent_rebalance and starts moving shards like crazy around the entire cluster, filling the storage on other instances in the way that it will never stop balancing. Kind regards -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAA0DmXaW8AdZJhGPGTRqD%3DYCSQ%2B2JdM-oGGpxkRgi0BZLOw2rg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: need help for search
I guess it's most likely because you added all your filters in should clause instead of must? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 15:36, Thibaut Owczarz thib...@1001pharmacies.com a écrit : i found my first error, no need user. because i search already in user. but why when i search a defined sku, no found only one ? curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{ query : { bool: { must: [ ], must_not: [ ], should: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } }, { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_keywords: 01b3ae496c0142f993cf131c607fe003 } }, { match: { service.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { medias.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { services.localized_label: 01b3ae496c0142f993cf131c607fe003 } } ] } } }'; they return all my users. Thanks Le jeudi 15 janvier 2015 14:58:16 UTC+1, Thibaut Owczarz a écrit : Hello, I start learning Elasticsearch, and i have a problem for understand how search. anyone could help me? My gist for all my structure and my data is here https://gist.github.com/thibaut1001/7a3000c3ff371be3a52d My problem is just in 4part To search in multi field by data like this ## We need to search henry in field selected curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{ query : { bool: { must: [ ], must_not: [ ], should: [ { term: { user.sku: henry } }, { term: { user.internal_code: henry } }, { term: { user.firstname: henry } }, { term: { user.lastname: henry } }, { term: { user.address: henry } }, { term: { user.city: henry } }, { term: { user.localized_description: henry } }, { term: { user.localized_keywords: henry } }, { term: { user.service.localized_label: henry } }, { term: { user.medias.localized_label: henry } }, { term: { user.services.localized_label: henry } } ] } } }'; ## Return no results Why? I have many question. Could you help me please, thanks -- You received this message because you are subscribed to the
Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example
I think I have a solution: Build JSON files so I could send it directly to _bulk saveJsonToEs(_bulk) Not sure if it will be optimized or even worked, I'll try. On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote: Hi, I work on a complex workflow using Spark (Parsing, Cleaning, Machine Learning). At the end of the workflow I want to send aggregated results to elasticsearch so my portal could query data. There will be two types of processing: streaming and the possibility to relaunch workflow on all available data. Right now I use elasticsearch-hadoop and particularly the spark part to send document to elasticsearch with the saveJsonToEs(myindex, mytype) method. The target is to have an index by day using the proper template that we build. AFAIK you could not add consideration of a feature in a document to send it to the proper index in elasticsearch-hadoop. What is the proper way to implement this feature? Have a special step useing spark and bulk so that each executor send documents to the proper index considering the feature of each line? Is there something that I missed in elasticsearch-hadoop? Julien -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9bba847-9e64-4336-92d9-80cd52c081d8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: need help for search
i'm ok, but my data search no say if is sku or code_internal or other field. if i do that, it's ok { query: { bool: { must: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } } ], must_not: [], should: [ { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_keywords: 01b3ae496c0142f993cf131c607fe003 } }, { match: { service.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { medias.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { services.localized_label: 01b3ae496c0142f993cf131c607fe003 } } ] } } } but if now i search with internal_code { query: { bool: { must: [ { term: { sku: 3401598272746 } } ], must_not: [], should: [ { term: { internal_code: 3401598272746 } }, { match: { firstname: 3401598272746 } }, { match: { lastname: 3401598272746 } }, { match: { address: 3401598272746 } }, { match: { city: 3401598272746 } }, { match: { localized_description: 3401598272746 } }, { match: { localized_keywords: 3401598272746 } }, { match: { service.localized_label: 3401598272746 } }, { match: { medias.localized_label: 3401598272746 } }, { match: { services.localized_label: 3401598272746 } } ] } } } my request is bad Le jeudi 15 janvier 2015 15:49:56 UTC+1, David Pilato a écrit : I guess it's most likely because you added all your filters in should clause instead of must? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 15:36, Thibaut Owczarz thi...@1001pharmacies.com javascript: a écrit : i found my first error, no need user. because i search already in user. but why when i search a defined sku, no found only one ? curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{ query : { bool: { must: [ ], must_not: [ ], should: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } }, { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_keywords: 01b3ae496c0142f993cf131c607fe003 } }, { match: { service.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { medias.localized_label:
Re: need help for search
Thanks for elastisearch-fr mailing list tomorrow I do a little game simple data and I give the request that I want to do and the result i need Thanks Le jeudi 15 janvier 2015 17:31:28 UTC+1, David Pilato a écrit : No worries for your english. Sorry. I missed your gist. Based on your examples, it sounds like you are french. Are you aware of the french mailing list? https://groups.google.com/forum/?hl=frfromgroups#!forum/elasticsearch-fr It would help a lot if you can simplify with some sample data and small queries what you are trying to do what does not work. So suppress all analyzers as I guess here it’s not really your concern at this stage. Try with only two or 3 fields. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 15 janv. 2015 à 17:13, Thibaut Owczarz thi...@1001pharmacies.com javascript: a écrit : hi, in my structure send in my gist, my question is just that: i have a search field. no say what i type in this field. but i need 1 request like this. { query : { bool: { must: [ ], must_not: [ ], should: [ { term: { sku: $datasearch } }, { term: { internal_code: $datasearch } }, { match: { firstname: $datasearch } }, { match: { lastname: $datasearch } }, { match: { address: $datasearch } }, { match: { city: $datasearch } }, { match: { localized_description: $datasearch } }, { match: { localized_keywords: $datasearch } }, { match: { service.localized_label: $datasearch } }, { match: { medias.localized_label: $datasearch } }, { match: { services.localized_label: $datasearch } } ] } } }'; Exemple : - - if $datasearch=sku, i have directly 1 user with this sku - if $datasearch=firstname, i have directly a list of user who have this firstname - if $datasearch=keyword, i have list of user who have this keyword - i take term for sku or internal_code because i can't search whith partial of this. (if my sku = 1234, no could found result if i type 123) - And for finish, in my data i have user : [1 - charles martin who have localized_keywords=moto, licorne, cheval, course ] [2 - henry martin who have localized_keywords=pétanque, chevaux, basket, parieur] i want with my request have this 2 user if $datasearch = cheval. I hope to be me understand , I can have a bad English thanks Le jeudi 15 janvier 2015 16:17:08 UTC+1, David Pilato a écrit : Could you reproduce this with a full test case so we understand exactly What you are doing? May be simplify your test. See elasticsearch.org/help -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 16:01, Thibaut Owczarz thi...@1001pharmacies.com a écrit : i'm ok, but my data search no say if is sku or code_internal or other field. if i do that, it's ok { query: { bool: { must: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } } ], must_not: [], should: [ { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city:
Re: Just initialize shards when problems but no rebalance
So is this still happening with 1.4.2? Here's the ticket. Looks like the fix was supposed to be in 1.4.1 https://github.com/elasticsearch/elasticsearch/issues/8538 On Thu, Jan 15, 2015 at 10:55 AM, Matías Waisgold mwaisg...@gmail.com wrote: Great, thank you. We are creating another cluster with more disk space to avoid this situations. By any chance do you have the link to the issue? 2015-01-15 13:26 GMT-03:00 Kimbro Staken ksta...@kstaken.com: I've experienced what you're describing. I called it a shard relocation storm and it's really tough to get under control. I opened a ticket on the issue and a fix was supposedly included in 1.4.2. What version are you running? If you want to truly manually manage this situation you could set cluster.routing.allocation.disk.threshold_enabled to false but that will likely cause other issues. I ended up just setting cluster.routing.allocation.disk.watermark.high to a really low value and actively managed shard allocations to prevent nodes from getting anywhere near that value. This is tricky as the way ES allocates shards it can easily run nodes out of disk if you're regularly creating new indices and those grow rapidly. Kimbro On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com wrote: Yes, I've seen that but the problem is that when the threshold is reached it removes all shards from the server instead of just removing 1 and balance. And when that happens the cluster starts to move shards over everywhere and it never stops. Another problem we are having is that in the file storage we see data from shards that are not assigned to itself so it can´t allocate anything in this dirty state. 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com: You could do this, but it's a lot of manual overhead to have to deal with. However ES does have some disk space awareness during allocation, take a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com wrote: Hi is there any setting that I can put to ES that it automatically assigns shards that are unassigned but never ever rebalance the cluster? I´ve found several issues when rebalancing and prefer to do it manually. If I set cluster.routing.allocation.enable to none nothing happens. If I set it to all then it starts rebalancing. Is it ok to combine cluster.routing.allocation.allow_rebalance to none and cluster.routing.allocation.enable to all. The issue is mainly because we are running low on disk and when that happens elasticsearch removes all shards from an instance, that doesn´t care about cluster.routing.allocation.cluster_concurrent_rebalance and starts moving shards like crazy around the entire cluster, filling the storage on other instances in the way that it will never stop balancing. Kind regards -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe . To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this
Re: real time match analysis
I was able to identify which field matched via explain, but couldn't see any information on which token filter was the reason for the match. I've tried specifying the analyzer name that the field uses as well as not specifying. If the explain is supposed to provide this data, I will give it another go and set up a test index with simpler analyzer setups. Also, in order to do this, I will need to run the explain separate from the search itself. My ultimate goal is to be able to do this within milliseconds (less than 10). Is this feasible with explain? On Wednesday, January 14, 2015 at 12:51:15 PM UTC-8, Nikolas Everett wrote: What about explain? On Wed, Jan 14, 2015 at 3:24 PM, Ed Kim edk...@gmail.com javascript: wrote: Just a friendly bump to see if anyone has any feedback. :) On Saturday, January 10, 2015 at 10:38:34 PM UTC-8, Ed Kim wrote: Hello all, I was wondering if anyone could offer some feedback on whether there is a way to determine how a document matched in real time. I currently use custom analyzers at index time to allow a broad array of matches for a given text field. I try to match based on phrases, synonyms, substrings, stemming, etc of a given phrase, and I would like to be able to figure out at search time, which analyzer was attributed to causing the match. Currently, I've gotten around this by creating child documents where the fields are fanned out to their respective analyzer types. So I have a child document where the field only applies stemming, another that uses only synonyms, etc. However, due to the growing number of fields that require analysis and the growth of my data set, I'd much prefer if I had less documents (and less complex too). I was hoping there would be a way to tag tokens at the analysis phase that could be used at the search phase to quickly determine my match level, but I was not able to find anything like this. Having said that, has anyone else ever tried to figure this out, or have an thoughts on how to leverage ES at a lower level to determine match? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eab16b7d-7d98-4096-b853-66ef65376c44%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/eab16b7d-7d98-4096-b853-66ef65376c44%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/326aca97-d937-41cc-9c28-7f89aa398c81%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Seeking a Director of Data Engineering in Austin TX
Hi Traci, This is a community based technical list. We'd greatly appreciate it if you didn't post job ads. On 16 January 2015 at 03:38, Traci Martin traci@gmail.com wrote: Hello All! I am a recruiter in Austin, TX trying to fill a Director of Data Engineering for my client, also in Austin. They are ELK stack evangelists and would prefer some with, at least knowledge of Lucene or Hadoop. This is really a great company to work for and probably the nicest client I have had the pleasure of working with. It is a permanent position offering great benefits, a laid back atmosphere, and very competitive salary with options. If you are interested please feel free to contact me. *There will be no re-lo provided and no sponsorship at this time. Traci Martin 512-640-3656 tmar...@intersysconsulting.com *Director **Data Engineering* *Who we are: * *Intersys Consulting* is a leading Business Intelligence, Data Management, and Application Development professional services organization focused on providing solutions with real business value. We provide a customer-focused approach to building authentic partnerships with our clients with objective counsel from concept to deployment for a consistent voice through the dynamic IT environment. *What we look for: * *Intersys Consulting *is focused on finding and cultivating talent across the IT space. We have over 100 developers, project managers, business analysts, and data management professionals, most with over ten years of experience in their respective fields. In new hires we look for authenticity; be proud of who you are and what you bring to the table, as well as those candidates who consistently deliver the highest quality product and have a deep desire to improve not just themselves, but the organization as a whole. *The Position:* Intersys Consulting is seeking a Director of Data Engineering to work at our client site in Austin, Texas. *Primary Responsibilities:* - Build and optimize each component of our data pipeline - Work with our data scientists to provide data in the optimal format - Work with our DevOps team to ensure the data infrastructure is reliable and scalable - Integrate with our data partners to enrich our firstparty data with thirdparty sources - Stay on top of cutting edge technologies to constantly improve and streamline our data systems *Qualifications:* - Experience with high performance, high traffic web systems - Experience with monitoring systems: New Relic, ELK stack, etc. - Experience with either Hadoop or Elasticsearch/Lucene and a desire and willingness to learn the other -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c61a0318-a9c8-496d-86de-54a4a7ba3349%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c61a0318-a9c8-496d-86de-54a4a7ba3349%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9N7JQ%2B8k0y8V4cV%2B7ddO0yqeOe783AVh0mdFKvyUTLsw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch: access document nested value in groovy script
I found this. I had to use _source.medals to access the nested documents which are stored in disk and not in memory. Thanks On Wednesday, January 14, 2015 at 10:55:15 AM UTC-8, Anil Kumar wrote: I have a document stored in ElasticSearch as below. _source: { firstname: John, lastname: Smith, medals:[ { bucket: 100, count: 1 }, { bucket: 150, count: 2 } ] } I can access the string type value inside a document using doc.firstname for scripted metric aggregation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html . But I am not able to get the field value using doc.medals[0].bucket. Can you please help me out and let me know how to access the values inside nested fields? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/19bd5fb9-b584-441f-8c55-c2f0d2b7d24e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using icu_collation plugin in Unit Tests
Thanks David! Sorry for being a new one in the ES world. But where would i download the JAR file from and what calss should i be using for the icu_collation? Thank you very much, Kumar Subramanian, On Thursday, January 15, 2015 at 12:52:12 PM UTC-8, David Pilato wrote: You most likely just need to add it as a dependency. Which is easy if you are using maven. David Le 15 janv. 2015 à 21:03, Kumar S krsku...@gmail.com javascript: a écrit : Hi, I am new to ES. I am using NodeBuilder in my unit test to run a local instance of ES. I would like to use the icu_collation plugin. How can i install and run the plugin form within this local instance. Is there API that i should use? if not, what are the different ways i can do this? Thank you very much, Kumar Subramanian. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a5e82b3-038b-4251-ae2c-f2216dc991f0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Excluding Terms Using a Minus Sign
Is there a way to exclude a term if the user precedes it with a minus sign; the way google does. For example, if I want to search for the word lovre, but I don't want the museum in France, I can search for: *louve -museum* as my search terms. Does ES support this? I am not finding anything like that in the documentation. Thanks All! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7e7fa83-332f-4fc9-a704-5abccb2d9856%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Using icu_collation plugin in Unit Tests
Hi, I am new to ES. I am using NodeBuilder in my unit test to run a local instance of ES. I would like to use the icu_collation plugin. How can i install and run the plugin form within this local instance. Is there API that i should use? if not, what are the different ways i can do this? Thank you very much, Kumar Subramanian. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Just initialize shards when problems but no rebalance
I'm on 1.4.1 and still seeing the same behavior. There should be a better practice than remove all shards at the same time and try to move a few. We are going to apply the same solution you mentioned, add more disk. Thank's for your help. 2015-01-15 16:09 GMT-03:00 Kimbro Staken ksta...@kstaken.com: So is this still happening with 1.4.2? Here's the ticket. Looks like the fix was supposed to be in 1.4.1 https://github.com/elasticsearch/elasticsearch/issues/8538 On Thu, Jan 15, 2015 at 10:55 AM, Matías Waisgold mwaisg...@gmail.com wrote: Great, thank you. We are creating another cluster with more disk space to avoid this situations. By any chance do you have the link to the issue? 2015-01-15 13:26 GMT-03:00 Kimbro Staken ksta...@kstaken.com: I've experienced what you're describing. I called it a shard relocation storm and it's really tough to get under control. I opened a ticket on the issue and a fix was supposedly included in 1.4.2. What version are you running? If you want to truly manually manage this situation you could set cluster.routing.allocation.disk.threshold_enabled to false but that will likely cause other issues. I ended up just setting cluster.routing.allocation.disk.watermark.high to a really low value and actively managed shard allocations to prevent nodes from getting anywhere near that value. This is tricky as the way ES allocates shards it can easily run nodes out of disk if you're regularly creating new indices and those grow rapidly. Kimbro On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com wrote: Yes, I've seen that but the problem is that when the threshold is reached it removes all shards from the server instead of just removing 1 and balance. And when that happens the cluster starts to move shards over everywhere and it never stops. Another problem we are having is that in the file storage we see data from shards that are not assigned to itself so it can´t allocate anything in this dirty state. 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com: You could do this, but it's a lot of manual overhead to have to deal with. However ES does have some disk space awareness during allocation, take a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com wrote: Hi is there any setting that I can put to ES that it automatically assigns shards that are unassigned but never ever rebalance the cluster? I´ve found several issues when rebalancing and prefer to do it manually. If I set cluster.routing.allocation.enable to none nothing happens. If I set it to all then it starts rebalancing. Is it ok to combine cluster.routing.allocation.allow_rebalance to none and cluster.routing.allocation.enable to all. The issue is mainly because we are running low on disk and when that happens elasticsearch removes all shards from an instance, that doesn´t care about cluster.routing.allocation.cluster_concurrent_rebalance and starts moving shards like crazy around the entire cluster, filling the storage on other instances in the way that it will never stop balancing. Kind regards -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe . To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com
filtering/querying on script field
Is it possible to filter or query on script_fields. If so, can you provide any example.. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6d4c738b-0975-4711-b9e1-a7d6eaa7830b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using icu_collation plugin in Unit Tests
You most likely just need to add it as a dependency. Which is easy if you are using maven. David Le 15 janv. 2015 à 21:03, Kumar S krskumar...@gmail.com a écrit : Hi, I am new to ES. I am using NodeBuilder in my unit test to run a local instance of ES. I would like to use the icu_collation plugin. How can i install and run the plugin form within this local instance. Is there API that i should use? if not, what are the different ways i can do this? Thank you very much, Kumar Subramanian. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8E14B6ED-B736-4CA8-9200-65E60006CDDC%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Help creating a near real time streaming plugin to perform replication between clusters
While it seems quite easy to attach listeners to an ES node to capture operations in translog-style and push out index/delete operations on shard level somehow, there will be more to consider for a reliable solution. The Couchbase developers have added a data replication protocol to their product which is meant for transporting changes over long distances with latency for in-memory processing. To learn about the most important features, see https://github.com/couchbaselabs/dcp-documentation and http://docs.couchbase.com/admin/admin/Concepts/dcp.html I think bringing such a concept of an inter cluster protocol into ES could be a good starting point, to sketch the complete path for such an ambitious project beforehand. Most challenging could be dealing with back pressure when receiving nodes/clusters are becoming slow. For a solution to this, reactive Java / reactive streams look like a viable possibility. See also https://github.com/ReactiveX/RxJava/wiki/Backpressure http://www.ratpack.io/manual/current/streams.html I'm in favor of Ratpack since it comes with Java 8, Groovy, Google Guava, and Netty, which has a resemblance to ES. In ES, for inter cluster communication, there is not much coded afaik, except snapshot/restore. Maybe snapshot/restore can provide everything you want, with incremental mode. Lucene will offer numbered segment files for faster incremental snapshot/restore. Just my 2¢ Jörg On Thu, Jan 15, 2015 at 7:00 PM, Todd Nine tn...@apigee.com wrote: Hey all, I would like to create a plugin, and I need a hand. Below are the requirements I have. - Our documents are immutable. They are only ever created or deleted, updates do not apply. - We want mirrors of our ES cluster in multiple AWS regions. This way if the WAN between regions is severed for any reason, we do not suffer an outage, just a delay in consistency. - As documents are added or removed they are rolled up then shipped in batch to the other AWS Regions. This can be a fast as a few milliseconds, or as slow as minutes, and will be user configurable. Note that a full backup+load is too slow, this is more of a near realtime operation. - This will sync the following operations. - Index creation/deletion - Alias creation/deletion - Document creation/deletion What I'm thinking architecturally. - The plugin is installed on each node in our cluster in all regions - The plugin will only gather changes for the primary shards on the local node - After the timeout elapses, the plugin will ship the changelog to the other AWS regions, where the plugin will receive it and process it Are there any api's I can look at that are a good starting point for developing this? I'd like to do a simple prototype with 2 1 node clusters reasonably soon. I found several plugin tutorials, but I'm more concerned with what part of the ES api I can call to receive events, if any. Thanks, Todd -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFxWfx_KasNcZVCA7wC6VTSM-NrC0hBn51iSnikGsdD8g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Kibana and nested documents -- include_in_parent
Hello, I am new to ElasticSearch and I have a very specific question. We have implemented our ElasticSearch cluster with a nested document structure. Each document is made of one ID, a key element and one field including several nested records that are inserted by the script api and the bulk update function. My question is, is it possible to view nested documents in Kibana, without using *include_in_parent, *because from preliminary testing it seams to be using more disk space when include_in_parent is in the mappings ? When include_in_parent is not in the mappings, the documents are not viewable within Kibana 4.0.0 Also, is there a function or way to display which documents have the most nested records, by using the size of the nested records in the document? I would like to have a pie chart, that could display them using the size of their nested attribute. Thank you in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85b0aed9-f74a-4031-b815-999f1df9be55%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to remove a cluster setting?
This is a known issue, see https://github.com/elasticsearch/elasticsearch/issues/6732 On 15 January 2015 at 22:01, Gary Gao garygaow...@gmail.com wrote: why this didn't work on my es : GET /_cluster/settings { persistent: { discovery: { zen: { minimum_master_nodes: 2 } } }, transient: { indices: { recovery: { translog_size: 1024kb, concurrent_streams: 3, translog_ops: 2000, max_bytes_per_sec: 400mb, file_chunk_size: 1024kb } } } } PUT _cluster/settings { transient: { indices.recovery.translog_size: } } response: { acknowledged: true, persistent: {}, transient: {} } When I do GET again, this setting still exists. On Tuesday, July 22, 2014 at 8:50:10 AM UTC+8, Jeffrey Zhou wrote: I made the following setting to my Elasticsearch cluster in order to decommission some old nodes in the cluster. After removed these old nodes, now I need to re-enable the cluster to allocate shards on those '10.0.6.*' nodes. Does anyone know how to remove this setting? PUT /_cluster/settings { transient: { cluster.routing.allocation.exclude._ip: 10.0.6.* } } Thanks in advance for any help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24d2a534-fe0f-4956-9d59-38b0300393d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/24d2a534-fe0f-4956-9d59-38b0300393d3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_gRwFZ1gyoXHrKU5-wWqyCg6d9p2in2jx%2B6jpyCyeRGw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Best pratices for index , search and updates
Am new to the elastic search ... Can some body throw me ideas about the best practices one should follow to get good performance for index ,search and updates -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b76ce70c-f2f5-4a56-b402-3b46ced79a82%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: filtering/querying on script field
Hi Samatha, I don’t think so because script field is created from fields of hit document, results of query/filter. You can use script filter instead http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html#query-dsl-script-filter. Masaru On January 16, 2015 at 04:40:49, samatha kankipati (samatha.kankip...@gmail.com) wrote: Is it possible to filter or query on script_fields. If so, can you provide any example.. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6d4c738b-0975-4711-b9e1-a7d6eaa7830b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54b89229.3d1b58ba.1877%40citra.local. For more options, visit https://groups.google.com/d/optout.
Re: Slow Commands with 1.2.4 to 1.4.2 Upgrade
Just added 2 more nodes with the same specs, and still seeing the same slowness. These commands no longer return anything, because it's taking too long to return. On Tuesday, December 30, 2014 at 3:54:34 PM UTC-8, Mark Walkom wrote: How slow? Is the load on your system high? On 31 December 2014 at 05:04, psk...@gmail.com javascript: wrote: I have about 50 GB of data (1 mil docs) in a single node--8 cores with 32 GB (24 GB heap). I just upgraded from 1.2.4 to 1.4.2, and I noticed that a few commands take a long time to return, and marvel doesn't work as well as it used to. Some of the commands that are slow for me are _cat/indices and _nodes. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f9ab96bf-b5c3-4f99-9c9c-e00568aada9c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f9ab96bf-b5c3-4f99-9c9c-e00568aada9c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9e3f7c4b-0705-4063-a591-8c5359ff8254%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
questions regarding elasticsearch-spark
Hi all, I'm quite familiar with ElasticSearch but new to spark, and elasticsearch-spark. My idea at this moment is that by using spark together with elasticsearch, it might be able to increase search performance when the time interval is fixed. question is, is hadoop need to be set up first to use elasticsearch-spark? does it depend on hadoop by any means? Sincerely, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL3_U40M1jth_Lw1-TqiWv0rW0M-Qa2yZsvJx-j-hf9Ngf5KOA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: questions regarding elasticsearch-spark
Hi Lee, No. Hadoop isn't required . You can use the spark Standalone mode ( https://spark.apache.org/docs/1.2.0/spark-standalone.html) when running ElasticSearch on spark. Regards Ravi On Thu, Jan 15, 2015 at 10:15 PM, Seungjin Lee sweetest0...@gmail.com wrote: Hi all, I'm quite familiar with ElasticSearch but new to spark, and elasticsearch-spark. My idea at this moment is that by using spark together with elasticsearch, it might be able to increase search performance when the time interval is fixed. question is, is hadoop need to be set up first to use elasticsearch-spark? does it depend on hadoop by any means? Sincerely, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL3_U40M1jth_Lw1-TqiWv0rW0M-Qa2yZsvJx-j-hf9Ngf5KOA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL3_U40M1jth_Lw1-TqiWv0rW0M-Qa2yZsvJx-j-hf9Ngf5KOA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAK4spt2bdpQ7t_xvtap5HTwva2un4te-rBd7P2ZP4qm2zNf3bA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Changing the Axis (X - Y) Label | Naming Legend
Hi, 1. Is there any way we can change the Label of X and Y axis 2. Is Kibana3, it was possible to name the legends, any way we can do this in Kiabana4 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b48b40d-7c99-4ecc-a896-2b664fb87fe4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Perl client: Cannot combine params and body?
I have a remote node that I am attempting to connect to that requires an api key as a URL parameter in addition to the body in order to get it to work. The code is as follows: #!/usr/bin/perl use v5.14; use warnings; use Search::Elasticsearch; use Data::Dumper; my $API_KEY='API_KEY'; my $ES = Search::Elasticsearch-new( cxn_pool = 'Static::NoPing', nodes = [{ scheme = 'https', host = 'service.host.com', port = 443, path = '/api/es/a_path', }], #send_get_body_as = 'POST', trace_to = 'Stdout', log_to = 'Stdout', ); my $res = $ES-search( params = { api_key = $API_KEY, }, body= { query = { bool = { must = { query_string = { default_field = _all, query = thisisasitethatdoesntexist.com, default_operator = AND } } } } } ); print Dumper($res); The generated curl is: # Request to: https://service.host.com:443/api/es/a_path curl -XGET 'http://localhost:9200/_search?api_key=API_KEYpretty=1' -d ' { query : { bool : { must : { query_string : { query : thisisasitethatdoesntexist.com, default_field : _all, default_operator : AND } } } } } ' When I replace localhost and the path with the proper host and path and run the curl command directly from the command line, I get zero hits back, which is what I expect. If I run the above perl, however, I get many millions of results back, which is exactly the same as what I get when I remove the body from the curl query (-d ''). So it seems that the combination of params and body causes body to get eaten? I looked at the code, but I couldn't find where this might be happening. Any help? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b990961c-a129-4cd0-b1e0-46f33f86c4ff%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Sorting on nested object collections
I've run the query with the smallest possible subset and the query is returning the results in the expected order so it appears to be correct. The biggest question that I have is does the second sort condition know to run on the *first* projected valuation that had the max date from the first sort condition? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/420cb05f-38f0-4ef9-a922-96e26f5ab5e1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Complex search
Take a look at highlighting http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html for highlighting the relevant parts of matches and at multifield http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-match-query.html#_boosting_individual_fieldssearch queries with boosting on individual fields. On Friday, 16 January 2015 08:19:08 UTC+11, Serge Schumacher wrote: Hi, I'm looking to create a search behaviour like Amazon does. I have an index with 3 Fields : Title, Description and Category. I want to search in the fields title and descriptions for the word *car* and I would like to get scored result like this : car -- score : 1 in category vehicles autocar-- score : 0,5 in category vehicles where the part car should highlighted ex : auto*car* carradio -- core : 0,5 in category vehicles where the part car should highlightedex : *car*radio and that if the word is found in the title field, the score should be higher as if the word would only be found in the description field. Is anybody out there who could help me on this topic or at least point me to the right direction where I should look for ? Thanks, Serge -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1b4e9e2-f84b-4b72-bdda-0b22a8584658%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How can we achieve an equivalent of this SQL a query in Elasticsearch?
What will be equivalent of the following query in the Elasticsearch world.. select myDate, col1, col2 from myTable where myDate = (select max(myDate) from myTable) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?
I think you need to run two queries for now. One is an aggregation (max). The other one use the result of this aggregation to search for documents. My 2 cents -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 15 janv. 2015 à 09:13, Lokesh Gupta lgup...@gmail.com a écrit : What will be equivalent of the following query in the Elasticsearch world.. select myDate, col1, col2 from myTable where myDate = (select max(myDate) from myTable) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/335F42ED-A70A-4401-82A6-6828DF3D794B%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: How to remove a cluster setting?
why this didn't work on my es : GET /_cluster/settings { persistent: { discovery: { zen: { minimum_master_nodes: 2 } } }, transient: { indices: { recovery: { translog_size: 1024kb, concurrent_streams: 3, translog_ops: 2000, max_bytes_per_sec: 400mb, file_chunk_size: 1024kb } } } } PUT _cluster/settings { transient: { indices.recovery.translog_size: } } response: { acknowledged: true, persistent: {}, transient: {} } When I do GET again, this setting still exists. On Tuesday, July 22, 2014 at 8:50:10 AM UTC+8, Jeffrey Zhou wrote: I made the following setting to my Elasticsearch cluster in order to decommission some old nodes in the cluster. After removed these old nodes, now I need to re-enable the cluster to allocate shards on those '10.0.6.*' nodes. Does anyone know how to remove this setting? PUT /_cluster/settings { transient: { cluster.routing.allocation.exclude._ip: 10.0.6.* } } Thanks in advance for any help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24d2a534-fe0f-4956-9d59-38b0300393d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How can I sort results by _id?
Making it index:not_analyzed should work, what is the issue with the results? Note that loading the _id in fielddata is typically very costly since the _id field is typically unique per document. On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote: I use a query dsl like: { filter: { exists: { field: info } }, sort: { _id: desc } } And the _id here is an integer like '123'. But the result is like: { took: 50, ... hits: { ... hits: [ { ... sort: [ null ] }] } } Also, I've tried to add _id: { index: not_analyzerd } in the _mapping. This time the sort section returns values. But I find the results are still partly unordered. Can I sort results by _id? How? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7uK%2BJY_2-C3LHGTc7YYRFVv2z_-o%3DuWbDhE2SQOJYFZA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: how to combine aggregations
I believe you could run a terms aggregation on the city field, and under this terms aggregation put two sum aggregations, one for clicks and one for displays. And finally you could derive the click rate from the sum of clicks and displays on client side? If you are starting playing with aggregations, I would recommend reading this blog post by Zachary Tong: http://www.elasticsearch.org/blog/intro-to-aggregations/ On Wed, Jan 14, 2015 at 10:43 PM, Yan Georget y...@ogury.co wrote: Hello, Let's imagine I am logging displays and clicks, say by cities. I can aggregate those by countries and I can also compute grand totals. Now I would like to compute click rates (clicks/displays) by cities, countries and I would also like to get a global click rate. How can I do this? It seems that I could use a scripted metric (I have not tried yet) but I would also like to expose these rates in Kibana. It is possible? Thanks in advance, Yan Georget -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c5356c3e-9322-4708-9c20-eed270ee57d9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c5356c3e-9322-4708-9c20-eed270ee57d9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j69AGX4bH4eL%3DxP6a84oT-64Op1FqGha5iMJJZ_hzVAnA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Out of memory on start with 38GB index
Hi, I am doing all my tests on a 38GB production index copy, with ES 1.4.2. I tried several memory settings and virtual machine sizes, but ES fails to start on a linux system with 48GB memory and 32GB for ES heap. Searching for similar issues, I encountered https://github.com/elasticsearch/elasticsearch/issues/8394 which is still open and looks fairly similar to my problem. The debug output at the start of looks like this : [2015-01-14 12:00:48,710][DEBUG][indices.cluster ] [Saint Elmo] [mailspool][1] creating shard [2015-01-14 12:00:48,710][DEBUG][index.service] [Saint Elmo] [mailspool] creating shard_id [1] [2015-01-14 12:00:48,791][DEBUG][index.deletionpolicy ] [Saint Elmo] [mailspool][1] Using [keep_only_last] deletion policy [2015-01-14 12:00:48,793][DEBUG][index.merge.policy ] [Saint Elmo] [mailspool][1] using [tiered] merge mergePolicy with expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_once[10], max_merge_at_once_explicit[30], max_merged_segment[5gb], segments_per_tier[10.0], reclaim_deletes_weight[2.0] [2015-01-14 12:00:48,794][DEBUG][index.merge.scheduler] [Saint Elmo] [mailspool][1] using [concurrent] merge scheduler with max_thread_count[2], max_merge_count[4] [2015-01-14 12:00:48,797][DEBUG][index.shard.service ] [Saint Elmo] [mailspool][1] state: [CREATED] [2015-01-14 12:00:48,797][DEBUG][index.translog ] [Saint Elmo] [mailspool][1] interval [5s], flush_threshold_ops [2147483647], flush_threshold_size [200mb], flush_threshold_period [30m] [2015-01-14 12:00:48,801][DEBUG][index.shard.service ] [Saint Elmo] [mailspool][1] state: [CREATED]-[RECOVERING], reason [from gateway] [2015-01-14 12:00:48,801][DEBUG][index.gateway] [Saint Elmo] [mailspool][1] starting recovery from local ... [2015-01-14 12:00:48,805][DEBUG][river.cluster] [Saint Elmo] processing [reroute_rivers_node_changed]: execute [2015-01-14 12:00:48,805][DEBUG][river.cluster] [Saint Elmo] processing [reroute_rivers_node_changed]: no change in cluster_state [2015-01-14 12:00:48,814][INFO ][gateway ] [Saint Elmo] recovered [1] indices into cluster_state [2015-01-14 12:00:48,814][DEBUG][cluster.service ] [Saint Elmo] processing [local-gateway-elected-state]: done applying updated cluster_state (version: 2) [2015-01-14 12:00:48,840][DEBUG][index.engine.internal] [Saint Elmo] [mailspool][1] starting engine [2015-01-14 12:00:58,406][DEBUG][cluster.service ] [Saint Elmo] processing [routing-table-updater]: execute [2015-01-14 12:00:58,407][DEBUG][gateway.local] [Saint Elmo] [mailspool][4]: throttling allocation [[mailspool][4], node[null], [P], s[UNASSIGNED]] to [[[Saint Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary allocation [2015-01-14 12:00:58,407][DEBUG][gateway.local] [Saint Elmo] [mailspool][2]: throttling allocation [[mailspool][2], node[null], [P], s[UNASSIGNED]] to [[[Saint Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary allocation [2015-01-14 12:00:58,407][DEBUG][gateway.local] [Saint Elmo] [mailspool][3]: throttling allocation [[mailspool][3], node[null], [P], s[UNASSIGNED]] to [[[Saint Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary allocation [2015-01-14 12:00:58,408][DEBUG][gateway.local] [Saint Elmo] [mailspool][0]: throttling allocation [[mailspool][0], node[null], [P], s[UNASSIGNED]] to [[[Saint Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary allocation [2015-01-14 12:00:58,408][DEBUG][cluster.service ] [Saint Elmo] processing [routing-table-updater]: no change in cluster_state [2015-01-14 12:01:31,619][WARN ][index.engine.internal] [Saint Elmo] [mailspool][1] failed engine [refresh failed] java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:187) at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104) at org.elasticsearch.index.cache.filter.weighted.WeightedFilterCache$FilterCacheFilterWrapper.getDocIdSet(WeightedFilterCache.java:177) at org.elasticsearch.common.lucene.search.OrFilter.getDocIdSet(OrFilter.java:55) at org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46) at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:130) at org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542) at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136) at org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59) at org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:554) at
Grandchild is not getting fetched by parent id
I am experiencing an issue while trying to retrieve a grandchild record by its parent ID. (child-grandchild relationship) The amount of hits in result is always zero. Also the same request is working fine for parent-child relationship. My records are getting organized kinda like this: Account --(one to one)-- User --(one to one)-- Address My execution environment is: - Fedora 21 CE - openjdk 1.8.0_25 - ES 1.4.2 Here is a script that is showing the problem # index creation curl -XPUT localhost:9200/the_index/ -d { \mappings\: { \account\ : {}, \user\ : { \_parent\ : { \type\ : \account\ } }, \address\ : { \_parent\ : { \type\ : \user\ } } } }; # mrsmith account creation curl -XPUT localhost:9200/the_index/account/mrsmith -d { \foo\ : \foo\ }; # john user creation curl -XPUT localhost:9200/the_index/user/john?parent=mrsmith -d { \bar\ : \bar\ }; # john user creation curl -XPUT localhost:9200/the_index/address/smithshouse?parent=john -d { \baz\ : \baz\ }; # Here I am trying to retrieve a record. Getting zero hits. curl -XGET localhost:9200/the_index/address/_search?pretty -d { \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : \john\ } } } } }; # Another approach with has_parent query type. Still getting zero hits. curl -XGET localhost:9200/the_index/address/_search?pretty -d { \query\ : { \has_parent\ : { \parent_type\ : \user\, \query\ : { \term\ : { \_id\ : \john\ } } } } }; # OK, lets try a routed search. Nope curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty -d { \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : \john\ } } } } }; # Routed has_parent query. Same curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty -d { \query\ : { \has_parent\ : { \parent_type\ : \user\, \query\ : { \term\ : { \_id\ : \john\ } } } } }; # Retrieving a record by itself. Going just fine. curl -XGET localhost:9200/the_index/address/smithshouse?parent=john; # Querying for user record with the same query. Got a hit. curl -XGET localhost:9200/the_index/user/_search?pretty -d { \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : \mrsmith\ } } } } }; The output: {acknowledged:true} {_index:the_index,_type:account,_id:mrsmith,_version:1,created:true}{_index:the_index,_type:user,_id:john,_version:1,created:true}{_index:the_index,_type:address,_id:smithshouse,_version:1,created:true} { took : 54, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } { took : 221, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } { took : 35, timed_out : false, _shards : { total : 1, successful : 1, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } { took : 481, timed_out : false, _shards : { total : 1, successful : 1, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } {_index:the_index,_type:address,_id:smithshouse,_version:1,found:true,_source:{ baz : baz }} { took : 65, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 1, max_score : 1.0, hits : [ { _index : the_index, _type : user, _id : john, _score : 1.0, _source:{ bar : bar } } ] } } You can find out on resuls that ES got the required shard, but no records have been fetched. Probably I am doing it in a wrong way, and if it so please fix me up. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bbaebc65-a87f-4857-a2a4-577b0b487c6b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Logstash output to Elastic search is not working
What do you mean by Can't see anything from the following command output: #curl http://localhost:9200/_search?pretty; from your first post? On Wednesday, January 14, 2015 at 3:27:57 AM UTC+1, zal...@gmail.com wrote: Hi Marc, I didn't find any .sincedb file from the file system.The problem is still. On Tuesday, January 13, 2015 at 8:39:57 PM UTC+8, Marc wrote: It all looks ok to me, since one can see that the logstash process is added as a node. However, you should try to remove the .sincedb files in your home directory. If sincedb files exist and you are trying to analyze identical log files it will know that it already read in the info and wait for new log entries in the file... ergo nothing will happen On Tuesday, January 13, 2015 at 10:05:10 AM UTC+1, zal...@gmail.com wrote: Hi all, I've started experimenting ELK today, unfortunately not succeeded. Everything installed properly and running without any error. When I start Logstash with the following command, output to STDOUT is fine. But nothing is seen in elastic search: #./logstash agent -e input { stdin {} } output { elasticsearch { host = localhost } stdout { codec = rubydebug}} What should I do? Elastic search's console output is: [2015-01-13 15:55:48,072][INFO ][node ] [Apollo] started [2015-01-13 15:55:51,392][INFO ][gateway ] [Apollo] recovered [1] indices into cluster_state [2015-01-13 15:55:51,422][INFO ][cluster.service ] [Apollo] added {[logstash-0.0.0.0-21484-2010][O0emX_s0SmauCfqAC_YaTA][inet[/172.16.4.88:9302]]{client=true, data=false},[logstash-suricata-3299-4010][cKVoEM8zT8KPVIAelpMSsg][suricata][inet[/172.16.4.88:9301]]{client=true, data=false},}, reason: zen-disco-receive(join from node[[logstash-suricata-3299-4010][cKVoEM8zT8KPVIAelpMSsg][suricata][inet[/172.16.4.88:9301]]{client=true, data=false}]) [2015-01-13 15:57:44,028][INFO ][cluster.service ] [Apollo] removed {[logstash-0.0.0.0-21484-2010][O0emX_s0SmauCfqAC_YaTA][inet[/172.16.4.88:9302]]{client=true, data=false},}, reason: zen-disco-node_failed([logstash-0.0.0.0-21484-2010][O0emX_s0SmauCfqAC_YaTA][inet[/172.16.4.88:9302]]{client=true, data=false}), reason transport disconnected [2015-01-13 16:01:29,656][INFO ][cluster.service ] [Apollo] added {[logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true, data=false},}, reason: zen-disco-receive(join from node[[logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true, data=false}]) [2015-01-13 16:21:07,373][INFO ][cluster.service ] [Apollo] removed {[logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true, data=false},}, reason: zen-disco-node_failed([logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true, data=false}), reason transport disconnected [2015-01-13 16:25:07,143][INFO ][cluster.service ] [Apollo] added {[logstash-0.0.0.0-24108-2010][k2ToeYbPRtW_LH4PLBcL-A][inet[/172.16.4.88:9302]]{client=true, data=false},}, reason: zen-disco-receive(join from node[[logstash-0.0.0.0-24108-2010][k2ToeYbPRtW_LH4PLBcL-A][inet[/172.16.4.88:9302]]{client=true, data=false}]) Can't see anything from the following command output: #curl http://localhost:9200/_search?pretty Please help me on this. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33030a85-786c-46e6-b24c-b9de6403b79a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How can I sort results by _id?
This is because the _id is a string field, so comparison is based on the lexicographical order, not numeric. On Thu, Jan 15, 2015 at 11:04 AM, Jason Zhang moc...@gmail.com wrote: What I'm confused is the 'sorted' results are still partly unordered. Also, if I query: { range: { _id: { gt: 1, lt: 1}}} the results contain _id: 199989. On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote: Making it index:not_analyzed should work, what is the issue with the results? Note that loading the _id in fielddata is typically very costly since the _id field is typically unique per document. On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote: I use a query dsl like: { filter: { exists: { field: info } }, sort: { _id: desc } } And the _id here is an integer like '123'. But the result is like: { took: 50, ... hits: { ... hits: [ { ... sort: [ null ] }] } } Also, I've tried to add _id: { index: not_analyzerd } in the _mapping. This time the sort section returns values. But I find the results are still partly unordered. Can I sort results by _id? How? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6x_GN9HuZzYtgB_T69hu0y_QVUCzqxxOKciEvKubgkUw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How can I sort results by _id?
No, an ID has to be a string -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jan 15, 2015 at 12:12 PM, Jason Zhang moc...@gmail.com wrote: Can I specify its type as integer in _mapping? Because the _id I use is rewritten. On Thursday, January 15, 2015 at 6:07:22 PM UTC+8, Adrien Grand wrote: This is because the _id is a string field, so comparison is based on the lexicographical order, not numeric. On Thu, Jan 15, 2015 at 11:04 AM, Jason Zhang moc...@gmail.com wrote: What I'm confused is the 'sorted' results are still partly unordered. Also, if I query: { range: { _id: { gt: 1, lt: 1}}} the results contain _id: 199989. On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote: Making it index:not_analyzed should work, what is the issue with the results? Note that loading the _id in fielddata is typically very costly since the _id field is typically unique per document. On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote: I use a query dsl like: { filter: { exists: { field: info } }, sort: { _id: desc } } And the _id here is an integer like '123'. But the result is like: { took: 50, ... hits: { ... hits: [ { ... sort: [ null ] }] } } Also, I've tried to add _id: { index: not_analyzerd } in the _mapping. This time the sort section returns values. But I find the results are still partly unordered. Can I sort results by _id? How? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40goo glegroups.com https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2475cb1a-5631-4b06-8507-28c4d81f9d4d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2475cb1a-5631-4b06-8507-28c4d81f9d4d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvWQtGKE6JDd6%3D%2BXRJENrAyLPkTE3%2BBRpFsEJ%2BS09bTpg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How can I sort results by _id?
I use a query dsl like: { filter: { exists: { field: info } }, sort: { _id: desc } } And the _id here is an integer like '123'. But the result is like: { took: 50, ... hits: { ... hits: [ { ... sort: [ null ] }] } } Also, I've tried to add _id: { index: not_analyzerd } in the _mapping. This time the sort section returns values. But I find the results are still partly unordered. Can I sort results by _id? How? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How can I sort results by _id?
What I'm confused is the 'sorted' results are still partly unordered. Also, if I query: { range: { _id: { gt: 1, lt: 1}}} the results contain _id: 199989. On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote: Making it index:not_analyzed should work, what is the issue with the results? Note that loading the _id in fielddata is typically very costly since the _id field is typically unique per document. On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com javascript: wrote: I use a query dsl like: { filter: { exists: { field: info } }, sort: { _id: desc } } And the _id here is an integer like '123'. But the result is like: { took: 50, ... hits: { ... hits: [ { ... sort: [ null ] }] } } Also, I've tried to add _id: { index: not_analyzerd } in the _mapping. This time the sort section returns values. But I find the results are still partly unordered. Can I sort results by _id? How? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How can I sort results by _id?
Can I specify its type as integer in _mapping? Because the _id I use is rewritten. On Thursday, January 15, 2015 at 6:07:22 PM UTC+8, Adrien Grand wrote: This is because the _id is a string field, so comparison is based on the lexicographical order, not numeric. On Thu, Jan 15, 2015 at 11:04 AM, Jason Zhang moc...@gmail.com javascript: wrote: What I'm confused is the 'sorted' results are still partly unordered. Also, if I query: { range: { _id: { gt: 1, lt: 1}}} the results contain _id: 199989. On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote: Making it index:not_analyzed should work, what is the issue with the results? Note that loading the _id in fielddata is typically very costly since the _id field is typically unique per document. On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote: I use a query dsl like: { filter: { exists: { field: info } }, sort: { _id: desc } } And the _id here is an integer like '123'. But the result is like: { took: 50, ... hits: { ... hits: [ { ... sort: [ null ] }] } } Also, I've tried to add _id: { index: not_analyzerd } in the _mapping. This time the sort section returns values. But I find the results are still partly unordered. Can I sort results by _id? How? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2475cb1a-5631-4b06-8507-28c4d81f9d4d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second
No, so the whole point was that, will elasticsearch be able to index say 10,000 documents per second? If yes, I can simply hook up my twitter code to es. If not, I would need to think of how to make that happen. Typically I've seen es indexes just around 30 docs per second which is pretty low. I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get some breathing room and enable it to index up to 10K docs per second. On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote: You have a Twitter input so you can extract content from Twitter and send to elasticsearch. No need to have Redis here. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com javascript: a écrit : Thanks. I'll have a look at the raw option. Regarding logstash, I don't fully understand it's utility. It says that it can take messages from a Redis server. But if I have to set up Redis, I could simply use the Redis river to index into Elasticsearch. Is there any additional benefit that Logstash would give me? On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote: You should look at raw option or better look at Logstash. My 2 cents. David Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit : Hi, I am using elasticsearch to index twitter stream. Until recently I was using the official river which was working great but realized that it throwing out much of the data (e.g. it is not storing number of followers etc. data). Is there a way to make the river to store all the data? If not, I am fine with writing a streaming code which will stream and index. But have a concern. How many documents can elasticsearch index per second? I might eventually need to index almost 10,000 documents (each document = 2 KB) per second (current requirement is of 100 documents per second). Is this even feasible? If yes, do I need to make any special modifications? Thanks-in-advance!! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Just initialize shards when problems but no rebalance
Yes, I've seen that but the problem is that when the threshold is reached it removes all shards from the server instead of just removing 1 and balance. And when that happens the cluster starts to move shards over everywhere and it never stops. Another problem we are having is that in the file storage we see data from shards that are not assigned to itself so it can´t allocate anything in this dirty state. 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com: You could do this, but it's a lot of manual overhead to have to deal with. However ES does have some disk space awareness during allocation, take a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com wrote: Hi is there any setting that I can put to ES that it automatically assigns shards that are unassigned but never ever rebalance the cluster? I´ve found several issues when rebalancing and prefer to do it manually. If I set cluster.routing.allocation.enable to none nothing happens. If I set it to all then it starts rebalancing. Is it ok to combine cluster.routing.allocation.allow_rebalance to none and cluster.routing.allocation.enable to all. The issue is mainly because we are running low on disk and when that happens elasticsearch removes all shards from an instance, that doesn´t care about cluster.routing.allocation.cluster_concurrent_rebalance and starts moving shards like crazy around the entire cluster, filling the storage on other instances in the way that it will never stop balancing. Kind regards -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?
Thanks.. Any other creative solutions? On Thursday, January 15, 2015 at 1:54:10 PM UTC+5:30, David Pilato wrote: I think you need to run two queries for now. One is an aggregation (max). The other one use the result of this aggregation to search for documents. My 2 cents -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 15 janv. 2015 à 09:13, Lokesh Gupta lgu...@gmail.com javascript: a écrit : What will be equivalent of the following query in the Elasticsearch world.. select myDate, col1, col2 from myTable where myDate = (select max(myDate) from myTable) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/906c817f-3ca4-4a7b-a0cc-a316076ae332%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Deploy Elasticsearch in live
Hi all, I use ElasticSearch locally on my PC as a search engine in a content website developed with the Django framework. I would like your opinion on the choice of a host offers production, ideally a scalable offering. I consulted the offers of DigitalOcean, Amazon EC2, OVH (OVH VPC, runAbove ...). Amazon EC2 offers a free initial first year but I do not know if this offer is suitable for my application. The first offers DigitalOcean is $ 5 / month, but the memory is only 512 MB. I just received an email and find out that it was now possible to deploy ElasticSearch Google Compute Engine. And what would be the impact on this configuration in live if I planned to use as Logstash and Kibana. Thank you in advance for your host offers advice in live. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/554ee6b4-7652-4ccc-9d17-27c117a26cf9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Http Cors Setting
In my case I faced the same issue cause my web tier is hosted on a different domain. My configuration is working quite well, I can see the pre-flight (OPTIONS) call returning 200 and then subsequent POST or GET being succesfull. I have used the following configuration: http.cors.enabled: true http.cors.allow-origin: my regex for my domains http.cors.allow-methods: OPTIONS, HEAD, GET, POST, PUT, DELETE http.cors.allow-credentials: true http.cors.allow-headers: X-Requested-With, Content-Type, Content-Length, accept, authorization You can work with Chrome F12 and verify which are the pre-flight headers sent by your application and add them to the parameter http.cors.allow-headers On Tuesday, November 11, 2014 at 1:21:05 PM UTC+1, Reza Samee wrote: Hello to all! Note: I'm new to ELK :) I'm using elasticsearch 1.4.0 and I'm trying to enable http.cors feature in elasticsearch. When I set http.cors.enabled: true and http.cors.allow-origin: * in config file and then restart, the http.cors feature doesn't enabled yet and I can't use kibana again. What's wrong with my config file? elasticsearch.conf: http.cors.enabled: true http.cors.allow-origin: * -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6aa7e2b-5809-4d42-8dc5-3fdfc7dd8547%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?
Sorted query? GET /myIndex/_search { query:{match_all: {}}, fields:[myDate,col1], sort: [ { myDate: { order: desc } } ] } On Thursday, January 15, 2015 at 1:05:22 PM UTC, Lokesh Gupta wrote: Thanks.. Any other creative solutions? On Thursday, January 15, 2015 at 1:54:10 PM UTC+5:30, David Pilato wrote: I think you need to run two queries for now. One is an aggregation (max). The other one use the result of this aggregation to search for documents. My 2 cents -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 15 janv. 2015 à 09:13, Lokesh Gupta lgu...@gmail.com a écrit : What will be equivalent of the following query in the Elasticsearch world.. select myDate, col1, col2 from myTable where myDate = (select max(myDate) from myTable) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8dffc8cf-8dee-4584-8fac-119482ea0831%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?
Thanks for the suggestion. Sorted query would work if I am okay with getting data for dates other than the max(date). But in the use case I have I need to restrict the results to be only for max(date). Is there a way to chain the output of a query as an input to another query? On Thursday, January 15, 2015 at 7:10:51 PM UTC+5:30, Mark Harwood wrote: Sorted query? GET /myIndex/_search { query:{match_all: {}}, fields:[myDate,col1], sort: [ { myDate: { order: desc } } ] } On Thursday, January 15, 2015 at 1:05:22 PM UTC, Lokesh Gupta wrote: Thanks.. Any other creative solutions? On Thursday, January 15, 2015 at 1:54:10 PM UTC+5:30, David Pilato wrote: I think you need to run two queries for now. One is an aggregation (max). The other one use the result of this aggregation to search for documents. My 2 cents -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 15 janv. 2015 à 09:13, Lokesh Gupta lgu...@gmail.com a écrit : What will be equivalent of the following query in the Elasticsearch world.. select myDate, col1, col2 from myTable where myDate = (select max(myDate) from myTable) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8973324f-32fc-4b90-b549-df014808d729%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Suggestion for an ElasticSearch plugin that forward documents at indexng time
Hi all, i would like to know if someone have play around an ElasticSearch plugin that can forward documents at indexing time to an external source, i dont want to do it throught logstash but only whene doc is indexed my goal is to take that plugin as an example of my custom one, i would like to have a plugin that receive a copy of a document that is indexed so we can manipulate it in real time and then send it to an external database or interface. thanks for all the suggestions regards -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/144e4f63-d686-4bb2-aded-cc9a77c28971%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
need help for search
Hello, I start learning Elasticsearch, and i have a problem for understand how search. anyone could help me? My gist for all my structure and my data is here https://gist.github.com/thibaut1001/7a3000c3ff371be3a52d My problem is just in 4part To search in multi field by data like this ## We need to search henry in field selected curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{ query : { bool: { must: [ ], must_not: [ ], should: [ { term: { user.sku: henry } }, { term: { user.internal_code: henry } }, { term: { user.firstname: henry } }, { term: { user.lastname: henry } }, { term: { user.address: henry } }, { term: { user.city: henry } }, { term: { user.localized_description: henry } }, { term: { user.localized_keywords: henry } }, { term: { user.service.localized_label: henry } }, { term: { user.medias.localized_label: henry } }, { term: { user.services.localized_label: henry } } ] } } }'; ## Return no results Why? I have many question. Could you help me please, thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c32551bd-cd04-4227-b783-40ca556928f7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Excluding Terms Using a Minus Sign
Yes simple query string query supports this. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#query-dsl-simple-query-string-query David Le 15 janv. 2015 à 20:37, Cindy Conway cindyanncon...@gmail.com a écrit : Is there a way to exclude a term if the user precedes it with a minus sign; the way google does. For example, if I want to search for the word lovre, but I don't want the museum in France, I can search for: louve -museum as my search terms. Does ES support this? I am not finding anything like that in the documentation. Thanks All! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7e7fa83-332f-4fc9-a704-5abccb2d9856%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/483A7A3D-24E0-4292-B156-55DD3874AEA5%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Complex search
Hi, I'm looking to create a search behaviour like Amazon does. I have an index with 3 Fields : Title, Description and Category. I want to search in the fields title and descriptions for the word *car* and I would like to get scored result like this : car -- score : 1 in category vehicles autocar-- score : 0,5 in category vehicles where the part car should highlighted ex : auto*car* carradio -- core : 0,5 in category vehicles where the part car should highlightedex : *car*radio and that if the word is found in the title field, the score should be higher as if the word would only be found in the description field. Is anybody out there who could help me on this topic or at least point me to the right direction where I should look for ? Thanks, Serge -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f5e7d15-74c2-49ab-bc8f-231d01899fa4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is ElasticSearch truly scalable for analytics?
Adding a 'node reduce phase' to aggregations is something I'm very interested in, and also investigating for the project I'm currently working on. If you introduce an extra reduction phase (for multiple shards on the same node) you introduce further potential for inaccuracies in the final results. This is true if you only reduce the top-k items per shard, but I was thinking to reduce the complete set of buckets locally. This takes a bit more cpu, and memory, but my guess is that this is negligible compared to the work already being done by the aggregation framework. If you reduce the buckets on the node before sending it to the coordinator it will actually increase the accuracy for aggregations! how many of these sorts of use cases generate sufficiently large trees of results where a node-level merging would be beneficial It is primarily beneficial for bigger installations with lots of shards per machine. Say 40 machines with ~100 shards per machine. In the current strategy where every node is sending 100 results there is a lot of bandwidth used on the coordinating node, since it receives 4000 responses, while it could do with 40 responses (1 per machine). I acknowledge it is a highly specialised use-case which not very many people run into, but it is a case I'm currently working on. How hard would it to be to implement such a feature? I have been looking into this, and it is not trivial. This needs to be implemented in/around the SearchService. This is the place I found to be implementing the different search strategies, eg. DFS. Unlike the rest of Elasticsearch it does seem to not consist of modules that implement different search strategies. Regarding the accuracy of top-k lists. I think the above, both the 'node reduce phase' and making the search strategy pluggable will be the groundwork to start working on implementations of TJA or TPUT strategies as discussed in an old issue[1] about accuracy of factes. The order of steps to take before reaching the ultimate goal would be: 1) Make search strategies (eg. query then fetch, dfs query then fetch) more modularized. 2) Make a search strategy with a 'node reduce phase' for the aggregations. Start with a complete reduce on the node. If that takes to much memory/time you can use TJA or TPUT locally on the node to get a reliable top-k list. 3a) Make a search strategy that executes TJA on the cluster coordinated by the coordinating node 3b) Make a separate strategy that executes TPUT on the cluster coordinated by the coordinating node I would say that 3a and 3b are 'easy' if doing a complete reduce in step 2 is not consuming to much resources. Adding strategies for both TJA and TPUT gives ultimate control to the user, as TPUT is not suited for reliably sorting on sums where the field might contain a negative value. But TPUT has better performance in latency over TJA. I would love to get an opinion from Adrien concerning the feasibility of such an approach. -- Nils [1] https://github.com/elasticsearch/elasticsearch/issues/1305 On Wednesday, January 14, 2015 at 7:47:07 PM UTC+1, Elliott Bradshaw wrote: How hard would it to be to implement such a feature? Even if there are only a handful of use cases, it could prove very helpful in these. Particularly since very large trees are the ones that will struggle the most with bandwidth issues. On Wednesday, January 14, 2015 at 1:36:53 PM UTC-5, Mark Harwood wrote: Understood, but what about cases where size is set to unlimited? Inaccuracies are not a concern in that case, correct? Correct. But if we only consider the scenarios where the key sets are complete and accuracy is not put at risk by merging (i.e. there is no top N type filtering in play), how many of these sorts of use cases generate sufficiently large trees of results where a node-level merging would be beneficial? On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote: If you introduce an extra reduction phase (for multiple shards on the same node) you introduce further potential for inaccuracies in the final results. Consider the role of 'size' and 'shard_size' in the terms aggregation [1] and the effects they have on accuracy. You'd arguably need a 'node_size' setting to also control the size of this new intermediate collection. All stages that reduce the volumes of data processed can introduce an approximation with the potential for inaccuracies upstream when merging. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote: Adrien, I get the feeling that you're a pretty heavy contributor to the aggregation module. In your experience, would a shard per cpu core strategy be an effective performance solution in a pure aggregation use case?If this could proportionally
Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second
I can index on my laptop 1-12000 docs per second. SSD drives of course. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 13:43, Chinch Pokli cpo...@gmail.com a écrit : No, so the whole point was that, will elasticsearch be able to index say 10,000 documents per second? If yes, I can simply hook up my twitter code to es. If not, I would need to think of how to make that happen. Typically I've seen es indexes just around 30 docs per second which is pretty low. I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get some breathing room and enable it to index up to 10K docs per second. On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote: You have a Twitter input so you can extract content from Twitter and send to elasticsearch. No need to have Redis here. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com a écrit : Thanks. I'll have a look at the raw option. Regarding logstash, I don't fully understand it's utility. It says that it can take messages from a Redis server. But if I have to set up Redis, I could simply use the Redis river to index into Elasticsearch. Is there any additional benefit that Logstash would give me? On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote: You should look at raw option or better look at Logstash. My 2 cents. David Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit : Hi, I am using elasticsearch to index twitter stream. Until recently I was using the official river which was working great but realized that it throwing out much of the data (e.g. it is not storing number of followers etc. data). Is there a way to make the river to store all the data? If not, I am fine with writing a streaming code which will stream and index. But have a concern. How many documents can elasticsearch index per second? I might eventually need to index almost 10,000 documents (each document = 2 KB) per second (current requirement is of 100 documents per second). Is this even feasible? If yes, do I need to make any special modifications? Thanks-in-advance!! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FD1F8969-377F-420C-A2CF-438F7383C890%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: need help for search
i found my first error, no need user. because i search already in user. but why when i search a defined sku, no found only one ? curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{ query : { bool: { must: [ ], must_not: [ ], should: [ { term: { sku: 01b3ae496c0142f993cf131c607fe003 } }, { term: { internal_code: 01b3ae496c0142f993cf131c607fe003 } }, { match: { firstname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { lastname: 01b3ae496c0142f993cf131c607fe003 } }, { match: { address: 01b3ae496c0142f993cf131c607fe003 } }, { match: { city: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_description: 01b3ae496c0142f993cf131c607fe003 } }, { match: { localized_keywords: 01b3ae496c0142f993cf131c607fe003 } }, { match: { service.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { medias.localized_label: 01b3ae496c0142f993cf131c607fe003 } }, { match: { services.localized_label: 01b3ae496c0142f993cf131c607fe003 } } ] } } }'; they return all my users. Thanks Le jeudi 15 janvier 2015 14:58:16 UTC+1, Thibaut Owczarz a écrit : Hello, I start learning Elasticsearch, and i have a problem for understand how search. anyone could help me? My gist for all my structure and my data is here https://gist.github.com/thibaut1001/7a3000c3ff371be3a52d My problem is just in 4part To search in multi field by data like this ## We need to search henry in field selected curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{ query : { bool: { must: [ ], must_not: [ ], should: [ { term: { user.sku: henry } }, { term: { user.internal_code: henry } }, { term: { user.firstname: henry } }, { term: { user.lastname: henry } }, { term: { user.address: henry } }, { term: { user.city: henry } }, { term: { user.localized_description: henry } }, { term: { user.localized_keywords: henry } }, { term: { user.service.localized_label: henry } }, { term: { user.medias.localized_label: henry } }, { term: { user.services.localized_label: henry } } ] } } }'; ## Return no results Why? I have many question. Could you help me please, thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ced6dc5-fa42-43bd-81bf-99ce4f7bedb5%40googlegroups.com. For more options, visit
Seeking a Director of Data Engineering in Austin TX
Hello All! I am a recruiter in Austin, TX trying to fill a Director of Data Engineering for my client, also in Austin. They are ELK stack evangelists and would prefer some with, at least knowledge of Lucene or Hadoop. This is really a great company to work for and probably the nicest client I have had the pleasure of working with. It is a permanent position offering great benefits, a laid back atmosphere, and very competitive salary with options. If you are interested please feel free to contact me. *There will be no re-lo provided and no sponsorship at this time. Traci Martin 512-640-3656 tmar...@intersysconsulting.com *Director **Data Engineering* *Who we are: * *Intersys Consulting* is a leading Business Intelligence, Data Management, and Application Development professional services organization focused on providing solutions with real business value. We provide a customer-focused approach to building authentic partnerships with our clients with objective counsel from concept to deployment for a consistent voice through the dynamic IT environment. *What we look for: * *Intersys Consulting *is focused on finding and cultivating talent across the IT space. We have over 100 developers, project managers, business analysts, and data management professionals, most with over ten years of experience in their respective fields. In new hires we look for authenticity; be proud of who you are and what you bring to the table, as well as those candidates who consistently deliver the highest quality product and have a deep desire to improve not just themselves, but the organization as a whole. *The Position:* Intersys Consulting is seeking a Director of Data Engineering to work at our client site in Austin, Texas. *Primary Responsibilities:* - Build and optimize each component of our data pipeline - Work with our data scientists to provide data in the optimal format - Work with our DevOps team to ensure the data infrastructure is reliable and scalable - Integrate with our data partners to enrich our firstparty data with thirdparty sources - Stay on top of cutting edge technologies to constantly improve and streamline our data systems *Qualifications:* - Experience with high performance, high traffic web systems - Experience with monitoring systems: New Relic, ELK stack, etc. - Experience with either Hadoop or Elasticsearch/Lucene and a desire and willingness to learn the other -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c61a0318-a9c8-496d-86de-54a4a7ba3349%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Need help for Custom Score On Array Fields
Hi All, Currently I have an array type, and I need to calculate score base on num matched terms filters. For example: Here is my mappings : { tweet : { properties : { tags : {type : string, index_name : tag}, } } } My data will be indexed like that : { tweet : { tags : [USA,VN,GM] } } So if I query with terms filter for : USA, and GM My score will be 2/3 (it's mean num matched / tags array). (actually the score will be calculated with complex formal, but I just one to focus on the problem) Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d047dbe-20d4-43ac-92c1-d1e21322c13c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Grandchild is not getting fetched by parent id
Hi Iv, You’d need to specify both parent and routing when you index grand children. See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/grandparents.html Masaru On January 15, 2015 at 20:44:43, Iv Igi (sayon...@gmail.com) wrote: I am experiencing an issue while trying to retrieve a grandchild record by its parent ID. (child-grandchild relationship) The amount of hits in result is always zero. Also the same request is working fine for parent-child relationship. My records are getting organized kinda like this: Account --(one to one)-- User --(one to one)-- Address My execution environment is: - Fedora 21 CE - openjdk 1.8.0_25 - ES 1.4.2 Here is a script that is showing the problem # index creation curl -XPUT localhost:9200/the_index/ -d { \mappings\: { \account\ : {}, \user\ : { \_parent\ : { \type\ : \account\ } }, \address\ : { \_parent\ : { \type\ : \user\ } } } }; # mrsmith account creation curl -XPUT localhost:9200/the_index/account/mrsmith -d { \foo\ : \foo\ }; # john user creation curl -XPUT localhost:9200/the_index/user/john?parent=mrsmith -d { \bar\ : \bar\ }; # john user creation curl -XPUT localhost:9200/the_index/address/smithshouse?parent=john -d { \baz\ : \baz\ }; # Here I am trying to retrieve a record. Getting zero hits. curl -XGET localhost:9200/the_index/address/_search?pretty -d { \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : \john\ } } } } }; # Another approach with has_parent query type. Still getting zero hits. curl -XGET localhost:9200/the_index/address/_search?pretty -d { \query\ : { \has_parent\ : { \parent_type\ : \user\, \query\ : { \term\ : { \_id\ : \john\ } } } } }; # OK, lets try a routed search. Nope curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty -d { \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : \john\ } } } } }; # Routed has_parent query. Same curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty -d { \query\ : { \has_parent\ : { \parent_type\ : \user\, \query\ : { \term\ : { \_id\ : \john\ } } } } }; # Retrieving a record by itself. Going just fine. curl -XGET localhost:9200/the_index/address/smithshouse?parent=john; # Querying for user record with the same query. Got a hit. curl -XGET localhost:9200/the_index/user/_search?pretty -d { \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : \mrsmith\ } } } } }; The output: {acknowledged:true} {_index:the_index,_type:account,_id:mrsmith,_version:1,created:true}{_index:the_index,_type:user,_id:john,_version:1,created:true}{_index:the_index,_type:address,_id:smithshouse,_version:1,created:true} { took : 54, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } { took : 221, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } { took : 35, timed_out : false, _shards : { total : 1, successful : 1, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } { took : 481, timed_out : false, _shards : { total : 1, successful : 1, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } {_index:the_index,_type:address,_id:smithshouse,_version:1,found:true,_source:{ baz : baz }} { took : 65, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 1, max_score : 1.0, hits : [ { _index : the_index, _type : user, _id : john, _score : 1.0, _source:{ bar : bar } } ] } } You can find out on resuls that ES got the required shard, but no records have been fetched. Probably I am doing it in a wrong way, and if it so please fix me up. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bbaebc65-a87f-4857-a2a4-577b0b487c6b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54b88def.46e87ccd.1877%40citra.local. For more options, visit https://groups.google.com/d/optout.