adding a new node: how to prime the data
We upgrade our clusters by adding new nodes, increase the number or replicas on the indices, let the new node catch up, then exclude the old node, and reduce the number of replicas on the indices. One cluster has a large index for which this operation takes hours. We tried to copy data from an existing node, but it copies everything regardless (I suspect it has no way to know what's new or not?). We're do plan to split that index into smaller shards, but in the meantime we are wondering if there is a better way of doing this? Thanks. --- http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/60428bf4-675b-47bd-8b8b-e90e7e967b0b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Changing Analyzer behavior for hyphens - suggestions?
Hi, thx for response and this awesome plugin bundle (especially for me as german). Unfortunately the hyphen analyzer plugin didnt do the job in the way i wanted it to be. The hyphen-analyzer does something similar like the whitespace analyzer - it just dont split on hyphen and instead see them as ALPHANUM characters (at least that is what i think right now). So the term this-is-a-test get tokenized into this-is-a-test which is nice behaviour, but in order to make an full-text-search on this field it should get tokenized into this-is-a-test, this, is, a and test as i wrote before. i think maybe abusing the word_delimiter token filter could do the job, because there is an option preserve_original. unfortunately if you adjust the filter like this: PUT /logstash-2014.11.20 { index : { analysis : { analyzer : { wordtest : { type : custom, tokenizer : whitespace, filter : [ lowercase, word ] } }, filter : { word : { type : word_delimiter, generate_word_parts: false, generate_number_parts: false, catenate_words: false, catenate_numbers: false, catenate_all: false, split_on_case_change: false, preserve_original: true, split_on_numerics: false, stem_english_possessive: true } } } } } and make an analyze test: curl -XGET 'localhost:9200/logstash-2014.11.20/_analyze?filters=word' -d 'this-is-a-test' the response is this: {tokens:[{token:this,start_offset:0,end_offset:4,type:ALPHANUM,position:1},{token:is,start_offset:5,end_offset:7,type:ALPHANUM,position:2},{token:a,start_offset:8,end_offset:9,type:ALPHANUM,position:3},{token:test,start_offset:10,end_offset:14,type:ALPHANUM,position:4}] which just says it tokenized it in everything expect the original term, which make me wonder if the preserver_original settings is working? Any idea on this? Am Mittwoch, 19. November 2014 18:26:09 UTC+1 schrieb Jörg Prante: You search for a hyphen-aware tokenizer, like this? https://gist.github.com/jprante/cd120eac542ba6eec965 It is in my plugin bundle https://github.com/jprante/elasticsearch-plugin-bundle Jörg On Wed, Nov 19, 2014 at 5:46 PM, horst knete badun...@hotmail.de javascript: wrote: Hey guys, after working with the ELK stack for a while now, we still got an very annoying problem regarding the behavior of the standard analyzer - it splits terms into tokens using hyphens or dots as delimiters. e.g logsource:firewall-physical-management get split into firewall , physical and management. On one side thats cool because if you search for logsource:firewall you get all the events with firewall as an token in the field logsource. The downside on this behaviour is if you are doing e.g. an top 10 search on an field in Kibana, all the tokens are counted as an whole term and get rated due to their count: top 10: 1. firewall : 10 2. physical : 10 3. management: 10 instead of top 10: 1. firewall-physical-management: 10 Well in the standard mapping from logstash this is solved using and .raw field as not_analyzed but the downside on this is you got 2 fields instead of one (even if its a multi_field) and the usage for kibana users is not that great. So what we need is that logsource:firewall-physical-management get tokenized into firewall-physical-management, firewall , physical and management. I tried this using the word_delimiter filter token with the following mapping: analysis : { analyzer : { my_analyzer : { type : custom, tokenizer : whitespace, filter : [lowercase, asciifolding, my_worddelimiter] } }, filter : { my_worddelimiter : { type : word_delimiter, generate_word_parts: false, generate_number_parts: false, catenate_words: false, catenate_numbers: false, catenate_all: false, split_on_case_change: false, preserve_original: true, split_on_numerics: false, stem_english_possessive: true } } } But this
Re: problem with heap space overusage
anyone? Середа, 19 листопада 2014 р. 13:32:37 UTC+1 користувач Serg Fillipenko написав: We have contact profiles (20+ fields, containing nested documents) indexed and their social profiles(10+ fields) indexed as child documents of contact profile. We run complex bool match queries, delete by query, delete children by query, faceting queries on contact profiles. index rate 14.31op/s remove by query rate 13.41op/s (such high value caused by fact we delete all child docs first before indexing of parent and then we index children again) search rate 2.53op/s remove by ids 0.15op/s We started to face this trouble under ES 1.2 but just after we started to index and delete (no searching requests yet) child documents. On ES 1.4 we have the same issue. What sort of data is it, what sort of queries are you running and how often are they run? On 19 November 2014 17:52, tetlika tet...@gmail.com wrote: hi, we have 6 servers and 14 shards in cluster, the index size 26GB, we have 1 replica so total size is 52GB, and ES v1.4.0, java version 1.7.0_65 we use servers with RAM of 14GB (m3.xlarge), and heap is set to 7GB around week ago we started facing next issue: random cluster servers around once per day/two are hitting the heap size limit (java.lang.OutOfMemoryError: Java heap space) in log, and cluster is failing - becomes red or yellow we tried adding more servers to cluster - even 8, but than it's a matter of time when we'll hit the problem, so looks no matter how many servers are in cluster - it will still hit the limit after some time before we started facing the problem we were running smoothly with 3 servers also we set indices.fielddata.cache.size: 40% but it didnt helped also, there are possible workarounds to decrease heap usage: 1) reboot some server - than heap becomes under 70% and for some time cluster is ok or 2) decrease number of replicas to 0, and than back to 1 but I dont like to use those workarounds how it can happen while all index can fit into RAM it can run out of it? thanks much for possible help -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ae23017-fde7-4b10-b31b-39076b079f10%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2ae23017-fde7-4b10-b31b-39076b079f10%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25ffd149-b6ce-45b3-a702-faa512b33f6a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Double entries in Kibana?
I am using logstash 1.4.1 and elsticsearch 1.1.1. My setup is showing an issue: For every new line (log) added to the log file I am getting two entries in Kibana i.e every log entry is showing twice in Kibana. However when I check my logstash console, the log line is showing only once. Any idea?? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/772279c9-69cd-43bc-8e10-2b2bfd572c27%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Double entries in Kibana?
My elasticsearch console: [2014-11-20 14:14:42,229][INFO ][cluster.metadata ] [Brothers Grimm] [logstash-2014.11.20] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [_default_] [2014-11-20 14:14:42,672][INFO ][cluster.metadata ] [Brothers Grimm] [logstash-2014.11.20] update_mapping [logs] (dynamic) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/528ecded-406e-4b20-82f8-2474306b536b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Issue with higlighting and analyzed tokens
Hi, I am experiencing an unexpected result with highlighting when using an _analyzer path in the mapping and custom analyzers. The highlighting returns no result for some query terms, even though the term matches and the document is returned. For other query terms it works fine. Somehow it seems that for querying and highlighting a different analyzer is used. See the following commands to reproduce the issue: https://gist.github.com/fxh/3246df167e4d72b0372f I am using ES v1.3.4 Thanks for any hint. Felix -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22f735a9-edf5-4ac6-a474-aa5e73cf9d74%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Best way to check a document has been indexed
Hello, I'm developing a piece of code that inserts a document into an elastcisearch server. The code uses libcurl to setup an HTTP request and capture the response. So, in order to check wether a document has been properly indexed, what is the official or proper way to do it? This is an example of a correct indexed document response: HTTP/1.1 201 Created Content-Type: application/json; charset=UTF-8 Content-Length: 92 {_index:someindex,_type:sometype,_id:fsAx6qXcQGCSrY1DWvQACw,_version:1,created:true} Should my program check that first header line contains 201 Created, what should happen if a 3xx redirection occurs? should I consider it as properly indexed too? Or instead should I just ignore the header and just check that the last part of the body string equals ' created:true} ' ? Thank you! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3b1ea5c6-61bf-4ce4-976a-705e17f3927f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How is the idf calculated for an alias that maps to multiple indexes?
If I have mapped an alias to more than one index and I execute a search using the alias name, will the idf be calculated for each individual index or will the idf calculation take into consideration all of the indexes that are mapped to the alias? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6553a0a-6c50-4215-8829-12903faccdd5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Search template in bulk
Hey, I was wondering if there is a way to execute search template queries in bulk. For eg, I have couple of search templates registered in .scripts index. I want to run a bulk search template using these templates and different set of parameters for each search in bulk. Example query could be: cat requests {} { template: { id: tempalte1 },params: { title: burger } } {} { template: { id: tempalte2 },params: { title: pizza } } GET /blogs/post/_msearch/template --data-binary @requests; echo Is a similar thing possible/planned? Cheers, Viranch -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8d56cffc-65b6-41db-bf34-cd2ad2205e30%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
MLT query delivering strange results
I have been trying to figure out how exactly the more_like_this query behaves. The doc says Under the hood, more_like_this simply creates multiple should clauses in a bool query of interesting terms extracted from some provided text. But I found several examples that I could not explain. This one illustrates it: I am using elasticsearch-1.4.0. I am creating an index like this (no mapping defined before): curl -XPUT 'localhost:9200/twitter/tweet/1' -d '{user : user1, message : aaa}' curl -XPUT 'localhost:9200/twitter/tweet/2' -d '{user : user1, message : aaa bbb}' curl -XPUT 'localhost:9200/twitter/tweet/3' -d '{user : user1, message : bbb aaa}' curl -XPUT 'localhost:9200/twitter/tweet/4' -d '{user : user2, message : bbb}' curl -XPUT 'localhost:9200/twitter/tweet/5' -d '{user : user2, message : aaa bbb}' curl -XPUT 'localhost:9200/twitter/tweet/6' -d '{user : user2, message : bbb aaa}' Then I query it: curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty=truesize=10' -d '{ query: { more_like_this_field: { message: { like_text: aaa bbb, percent_terms_to_match: 1, min_term_freq: 1, max_query_terms: 3, min_doc_freq: 1 } } } } { took : 3, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 5, max_score : 14.4000225, hits : [ { _index : twitter, _type : tweet, _id : 4, _score : 14.4000225, _source:{user : user2, message : bbb} }, { _index : twitter, _type : tweet, _id : 2, _score : 12.729599, _source:{user : user1, message : aaa bbb} }, { _index : twitter, _type : tweet, _id : 5, _score : 12.72813, _source:{user : user2, message : aaa bbb} }, { _index : twitter, _type : tweet, _id : 3, _score : 12.728111, _source:{user : user1, message : bbb aaa} }, { _index : twitter, _type : tweet, _id : 6, _score : 12.5501995, _source:{user : user2, message : bbb aaa} } ] } } So text 1 aaa is missing. I get the same result if I use like_text: bbb aaa in the above query. However, if I use like_text: aaa I get what I would expect: All texts except bbb are returned. What kind of should-query is generated by more_like_this in the above example? I would have expected: curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty=truesize=10' -d '{ query: { bool: { should: [ { match: { message: aaa } }, { match: { message: bbb } } ], minimum_should_match: 2 } } }' but this obviously returns neither aaa nor bbb. Why does the above more_like_this query return bbb but not aaa? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53fae773-9359-4a1a-980e-a42d1dfd6d0f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Custom Aggregation / Access to documents
When implementing a custom aggregation: can I access the result documents in my aggregator so that I can skip result documents based on it's properties? To make it clearer I explain I have an index products that contains product documents. A product contains a nested collection of variant documents. The requirements is to have a query that return variant documents. Kind of a Nested Aggregation. To complicate things: not all variants should be returned. Some dynamic filtering has to be applied. And this filtering depends on properties of the nested variant documents. I need to peek at all variants contained in a product in order to determine if a variant should be included in the result or not. I am thinking that I could accomplish this writing a plugin which contains the custom aggregation when my initial question could be answered with yes. Thanks for your suggestions and insights. A. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7dca8221-0764-4703-8b75-0ae492961dd1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: What is the best practice for periodic snapshotting with awc-cloud+s3
Hello, Sorry for hijacking this thread, but I'm currently also pondering the best way to perform periodic snapshots in AWS. My main concern is that we are using blue-green deployment with ephemeral storage on EC2, so if for some reason there is a problem with the cluster, we might lose a lot of data, therefore I would rather do frequent snapshots (for this reason, we are still using the deprecated S3 gateway). The thing is, you claim that Having too many snapshots is problematic and that one should prune old snapshots. Since snapshots are incremental, this will imply data loss, correct? Also, is the problem related to the number of snapshots or the size of the data? Is there any way to merge old snapshots into one? Would this solve the problem? Finally, if I create a cronjob to make automatic snapshots, can I run into problems if two instances attempt to create a snapshot with the same name at the same time? Also, what's the best way to do a snapshot on shutdown? Should I put a script on init.d/rc.0 to run on shutdown before elasticsearch shuts down? I've seen cases where the EC2 instances have not so grateful shutdowns, so it would be wonder if there is a better way to do this on a cluster level (ie, if a node A notices that a node B is not responding, then it automatically makes a snapshot). Sorry if some of these questions don't make much sense, I'm still quite new to elasticsearch and have not completly understood the new snapshot feature. Em sexta-feira, 14 de novembro de 2014 08h19min42s UTC, Sally Ahn escreveu: Yes, I am now seeing the snapshots complete in about 2 minutes after switching to a new, empty bucket. I'm not sure why the initial request to snapshot to the empty repo was hanging because the snapshot did in fact complete in about 2 minutes, according to the S3 timestamp. Time to automate deletion of old snapshots. :) Thanks for the response! On Thursday, November 13, 2014 9:35:20 PM UTC-8, Igor Motov wrote: Having too many snapshots is problematic. Each snapshot is done in incremental manner, so in order to figure out what changes and what is available all snapshots in the repository needs to be scanned, which takes time as number of snapshots growing. I would recommend pruning old snapshots as time goes by or starting snapshots into a new bucket/directory if you really need to maintain 2 hour resolution for 2 months old snapshots. The get command can sometimes hang because it's throttled by the on-going snapshot. On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote: I am also interested in this topic. We were snapshotting our cluster of two nodes every 2 hours (invoked via a cron job) to an S3 repository (we were running ES 1.2.2 with cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with cloud-aws-plugin 2.4.0 but are still seeing issues described below). I've been seeing an increase in the time it takes to complete a snapshot with each subsequent snapshot. I see a thread https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ where someone else was seeing the same thing, but that thread seems to have died. In my case, snapshots have gone from taking ~5 minutes to taking about an hour, even between snapshots where data does not seem to have changed. For example, you can see below a list of the snapshots stored in my S3 repo. Each snapshot is named with a timestamp of when my cron job invoked the snapshot process. The S3 timestamp on the left shows the completion time of that snapshot, and it's clear that it's steadily increasing: 2014-09-30 10:05 686 s3://bucketname/snapshot-2014.09.30-10:00:01 2014-09-30 12:05 686 s3://bucketname/snapshot-2014.09.30-12:00:01 2014-09-30 14:05 736 s3://bucketname/snapshot-2014.09.30-14:00:01 2014-09-30 16:05 736 s3://bucketname/snapshot-2014.09.30-16:00:01 ... 2014-11-08 00:52 1488 s3://bucketname/snapshot-2014.11.08-00:00:01 2014-11-08 02:54 1488 s3://bucketname/snapshot-2014.11.08-02:00:01 ... 2014-11-08 14:54 1488 s3://bucketname/snapshot-2014.11.08-14:00:01 2014-11-08 16:53 1488 s3://bucketname/snapshot-2014.11.08-16:00:01 ... 2014-11-11 07:00 1638 s3://bucketname/snapshot-2014.11.11-06:00:01 2014-11-11 08:58 1638 s3://bucketname/snapshot-2014.11.11-08:00:01 2014-11-11 10:58 1638 s3://bucketname/snapshot-2014.11.11-10:00:01 2014-11-11 12:59 1638 s3://bucketname/snapshot-2014.11.11-12:00:01 2014-11-11 15:00 1638 s3://bucketname/snapshot-2014.11.11-14:00:01 2014-11-11 17:00 1638 s3://bucketname/snapshot-2014.11.11-16:00:01 I suspected that this gradual increase was related to the accumulation of old snapshots after I tested the following: 1. I created a brand new cluster with the same hardware specs in the same datacenter and
Re: upgrading from 0.90.7 to 1.4. Gotchas?
I would be interested too, we are using the same 0.90.7 version. Jason On Thu, Nov 20, 2014 at 2:03 PM, Yves Dorfsman y...@zioup.com wrote: Are there any precautions to take before upgrading from 0.9 to 1.4? Different data types? Different API calls? etc... And, what is the best way to upgrade? Can we just add a node at the newer version and let it pull the data? Thanks. http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itzYRinoR1-qW2FRXQT6bxTpuPADoW6zTsJ%3DgKLoGmZBKA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Best way to check a document has been indexed
Hi , Just check if its 200 ( Indexed ) or 201 ( Created ) . HTTP status code alone should be sufficient. Thanks Vineeth On Thu, Nov 20, 2014 at 4:01 PM, asanchez asanchez1...@gmail.com wrote: Hello, I'm developing a piece of code that inserts a document into an elastcisearch server. The code uses libcurl to setup an HTTP request and capture the response. So, in order to check wether a document has been properly indexed, what is the official or proper way to do it? This is an example of a correct indexed document response: HTTP/1.1 201 Created Content-Type: application/json; charset=UTF-8 Content-Length: 92 {_index:someindex,_type:sometype,_id:fsAx6qXcQGCSrY1DWvQACw,_version:1,created:true} Should my program check that first header line contains 201 Created, what should happen if a 3xx redirection occurs? should I consider it as properly indexed too? Or instead should I just ignore the header and just check that the last part of the body string equals ' created:true} ' ? Thank you! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3b1ea5c6-61bf-4ce4-976a-705e17f3927f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3b1ea5c6-61bf-4ce4-976a-705e17f3927f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nMsA%3DkFzSGwApWK-5f8mUmK8YdJuvKn_Zct7yn550Aeg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Custom Aggregation / Access to documents
Hi, I think you should be able to achieve the functionality you need without writing a custom aggregation. If you use a combination of the filter aggregation wrapped in a nested aggregation then you should be able to filter the child documents (variant) before they are returned. Then if you want to return the top X 'variants' you can use the top_hits aggregation as a sub-aggregation of the filter aggregation. Hope this helps, Colin On Thursday, 20 November 2014 11:51:40 UTC, AndyP wrote: When implementing a custom aggregation: can I access the result documents in my aggregator so that I can skip result documents based on it's properties? To make it clearer I explain I have an index products that contains product documents. A product contains a nested collection of variant documents. The requirements is to have a query that return variant documents. Kind of a Nested Aggregation. To complicate things: not all variants should be returned. Some dynamic filtering has to be applied. And this filtering depends on properties of the nested variant documents. I need to peek at all variants contained in a product in order to determine if a variant should be included in the result or not. I am thinking that I could accomplish this writing a plugin which contains the custom aggregation when my initial question could be answered with yes. Thanks for your suggestions and insights. A. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e122e8c9-c0df-4d6b-b91c-0b208d4e5e2e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to migrate lucene index into elasticsearch
Thanks Jorg for the guidance and I have am trying the suggested approach #1 and I have further question on it. As you mentioned - *- a custom written tool could traverse the segments and extract field information and build a rudimentary mapping (without analyzer, without info about _all and _source and all Elasticsearch add-ons).* We already have a Lucene Index metadata (i.e. field names, type, analyzer etc.) available as an xml, so I can create the mapping without traversing the segments. Should I create segment file segments.gen using the mapping file and using some dummy values and then put all the other old lucene index files ( except segments.gen ) from existing lucene index files (e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.) *sample mapping xml file :-* Mapping indexField analyzedtrue/analyzed fieldanalyzerStandard/fieldanalyzer indexFieldNameAddressLine1/indexFieldName nameAddressLine1/name storedtrue/stored typestring/type /indexField indexField analyzedtrue/analyzed fieldanalyzerStandard/fieldanalyzer indexFieldNameBuilding_Name/indexFieldName nameBuilding_Name/name storedtrue/stored typestring/type /indexField indexField analyzedtrue/analyzed fieldanalyzerKeyword/fieldanalyzer indexFieldNameGNAF_PID/indexFieldName nameGNAF_PID/name storedtrue/stored typestring/type /indexField ... /Mapping Thanks On Thu, Nov 13, 2014 at 11:59 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: It is almost impossible to use just binary-only Lucene index for migration, because Elasticsearch needs additional info which is not available in Lucene. The only method is to reindex data over the Elasticsearch API. There is a bumpy road but I don't know if one ever tried that: - a custom written tool could traverse the segments and extract field information and build a rudimentary mapping (without analyzer, without info about _all and _source and all Elasticsearch add-ons) - another tool could try to reconstruct docs (like the tool Luke) and write them to a file in bulk format. Not having the source of the docs means it must be possible to retrieve the original input from the Lucene index (which is almost never the case) - the result could be re-indexed using the Elasticsearch API (assuming all analyzers and tokenizers are in place) but a lot of work would have to be done The preferred way is to rewrite the code that uses the Lucene API to use the Elasticsearch API and re-run the indexing process. Jörg On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Hi All, I have an embedded Search Engine in our product which is based on Lucene 4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4 for better distributed support (sharding and replication, mainly). Could you guide me how one should migrate the existing indexes created by Lucene to ES. I have referred to the mail thread - migrate lucene index into elasticsearch https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ. And based on the discussion in it appears to me that it's not a easy job or even not feasible. I am wondering if there is some plugin (river) or tool or any work around available to migrate the existing indexes created by Lucene to ES. I googled that an ES plugin available for SOLR to ES migration : http://blog.trifork.com/2013/01/29/migrating-apache-solr-to-elasticsearch/ . Do we have someting similar for Lucene to ES migration. Thanks Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web
Re: Issue with higlighting and analyzed tokens
I remember there was a github issue about path specified analyzers and highlighting but I can't find it. Reading it may be your best bet. On Thu, Nov 20, 2014 at 5:14 AM, fe...@squirro.com wrote: Hi, I am experiencing an unexpected result with highlighting when using an _analyzer path in the mapping and custom analyzers. The highlighting returns no result for some query terms, even though the term matches and the document is returned. For other query terms it works fine. Somehow it seems that for querying and highlighting a different analyzer is used. See the following commands to reproduce the issue: https://gist.github.com/fxh/3246df167e4d72b0372f I am using ES v1.3.4 Thanks for any hint. Felix -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22f735a9-edf5-4ac6-a474-aa5e73cf9d74%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/22f735a9-edf5-4ac6-a474-aa5e73cf9d74%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1TZLQ6L0q4DEq77rfRqBpqF78dehYRUnFEtmZx8j0ANw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Is Elasticsearch also supported on AIX and HP Itanium 11.31
Is Elasticsearch also supported on AIX and HP Itanium 11.31. I didn't find this information in release notes or installation instructions. Thanks Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3Lh3GMkBM6sGs%2B9oR0A3nvTO5tU65ZHEa-Sg7ubBbL_7g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: analyzing wildcard queries ...
hi jörg just wanted to tell you that i will/can not fork/commit my improvement on wildcard analysis cause i'm no longer 100% convinced that it is really an improvement resp. can be used in general... after rethinking i must admit that i was probably too much focused on my concrete issues with email addresses using the standard analyzer e.g marco.kamm@brain analyzed into the tokens [marco.kamm] [brain.net] the original idea behind using the standard analyzer was that users will find sth. when searching for brain.net or marco.kamm without having to use any wildcards! (the old lucene standard analyzer did also split on '.' charaters so even marco or brain could be found) somehow i thought it would also make sense to search for e.g. marco.*@brain.net or marco.kamm@*.net my first improvement approach was based on the existing code but instead of concatenating all the analyzed sub-string parts into a single wildcard query i tried to build a boolean query containing the individual analyzed parts as either prefix or wildcard queries ... e.g. marco.*@brain.net -- marco* AND *brain.net marco.kamm@*.net -- marco.kamm* AND *net first query can be only prefix query (when not preceeded by a single wildcard char) and last one could be a postfix query everthing in between was surounded by '*'...'*' another (optimized) approach is based on the following technique: generate a random letter sequence that is not present in the search term, replace the wildcards by this sequence and feed it to the analyzer this way if the anlayzer produces more than one token out of a single wildcard input you can be sure that original inputs would also be split into more terms and you need to use more than one single query obj ... after analyzing, process the resulting tokens one by one and combine them into a boolean AND query. foreach token undo the wildcard replacement and check the occurences of wildcard characters. if a token contains no wildcards at all use a termquery, if the token only contains a wildcard char at the end use prefixquery else use wildcard query ... e.g. marco.*@brain.net -- marco.{randomLetterSequence}@brain.net -- [marco.{randomLetterSequence}] [brain.net] -- marco.* AND brain.net marco.kamm@*.net -- marco.kamm@{randomLetterSequence}.net -- [marco.kamm] [{randomLetterSequence}.net] -- marco.kamm AND *.net these approaches could work for my cases (at least they produce some results where the original code didn't find anything, althought the results maybe inaccurate but this lies in the nature of AND combinations e.g. marco.*@brain.net transformed into marco.* AND brain.net could also find brain@marco.org etc.) but i think for most of the cases (where the queried field uses an analyzer that doesn't split up terms into several tokens e.g. keyword analyzer etc. ) the existing code does already the best effort that can be done in a generic way (without knowing what the analyzer is doing with certain characters) maybe you can use sth. out of my 2nd. approach with testing the analyzers behaviour by replacing the wildcards with sth. that doesn't get eaten up to see if the input is split or not (i think a sequence of plain asci letters could be a way but i'm not sure if this could server as a general solution e.g for japanes analyzers etc. for me a sequence of asci letters seems like kind of lowest common denominator LCD). for the moment we're trying to live with the current best effort approach maybe analyzing some fields twice once with a standard analyzer or sth. and additionally with a keyword analyzer, and direct pure wildcard queries to the keywork field. or maybe we're going to split up email addresses into a seperate username- and domain field etc. thank you anyway for your time cheers marco Am Mittwoch, 19. November 2014 09:56:43 UTC+1 schrieb mka...@gmail.com: hi i have text/email addresses indexed with the standard analyzer. e.g. marco.k...@brain.net that results in two tokens being in the index: [marco.kamm] and [brain.net] i want to search using query_string query and wildcards like: { fields:[contact_email], query : { query_string : { query : (contact_email:(marco.*@brain.net)), default_operator : and, analyze_wildcard: true } } } from my past working-experience with lucene i know that wildcards queries are kind of problematic cause they're not analyzed by default. (to workaround this behaviour i wrote a custom parser that prepares the query string depending on the specific field analyzer in prior before passing it to the lucene query parser) at first when i noticed the analyze_wildcard parameter/option i thought great/cool! i no longer need my custom magic parser ,-), elasticsearch provides built-in support for my problems ... when testing the analyze_wildcard behaviour with pure prefix queries like marco.kamm@brain.* it worked like a charm! resp. did the same thing i
Re: Does nested query with operator honor the operator or does it always display some default behavior
Hi Ivan: I tried using the _explain API (end point to get an explanation, it returned this : { _index: news, _type: swift, _id: _explain, _version: 5, created: false } I tried adding explain:true as part of my query which resulted in this : _explanation: { value: 10.384945, description: Score based on child doc range from 75103316 to 75103366 } That said, If you think the syntax is not familiar, How do you suggest the query be created ? ( of course, I could split the query into a boolean query with two MUST nested conditions) which does result in the documents I am looking for). However, If I have a list of more than 2 values to be seated for, the query becomes unseemly. The JAVA API does seem to allow for list of values to be passed in.. here is a code snippet for who I am using the JAVA API : qb = QueryBuilders.nestedQuery(fieldName, QueryBuilders. boolQuery(). must(QueryBuilders. matchQuery(fieldName + .v, values). operator( MatchQueryBuilder.Operator.AND)). must(QueryBuilders .rangeQuery(fieldName + .s) .gte(0.6)) Where values is a List of values please let me know if I am using the API incorrectly. Thanks Ramdev On Wednesday, 19 November 2014 14:13:41 UTC-6, Ivan Brusic wrote: As mentioned before, that syntax seems strange to me. I have never seen an array used with a match query. I wonder what the resulting Lucene query is. I think that analyzed/non-analyzed just might be a red herring. What does the explanation output say? -- Ivan On Wed, Nov 19, 2014 at 10:24 AM, Ramdev Wudali agas...@gmail.com javascript: wrote: The fields (I am searching against) are analyzed, by the default analyzer. The query as you I noted in my question was generated by using the JAVA API, So the array syntax is generated by the API's interpretation. That said, I ran a few more experiments. If the field were not analyzed (unlike my non experiment case), The query function works and returns the right documents. (meaning where both the values exist) in the returned documents. But if they are analyzed, the operator is not honored. So now my question is, why would not analyzed fields cause the operator to be honored ? and Does the operator field within a nested query depend on if the field in the nested field is actually analyzed or not. ? Ramdev On Tuesday, 18 November 2014 14:45:53 UTC-6, Ivan Brusic wrote: I have never seen the array syntax with the match query, so I am not sure what the behavior should be. Since your search terms are not analyzed in your example, a terms query with a minimum match of 100% should work. If not, perhaps creating a single search term of your existing terms? -- Ivan On Tue, Nov 18, 2014 at 10:23 AM, Ramdev Wudali agas...@gmail.com wrote: Hi : I have the following query : { query: { bool: { must: { nested: { query: { bool: { must: [ { match: { NESTED_FIELD.v: { query: [ AAPL.OQ, GOOGL.OQ], operator: and } } }, { range: { NESTED_FIELD.s: { from: 0.6, to: null, include_lower: true, include_upper: true } } } ] } }, path: NESTED_FIELD } } } }, filter: { bool: { must: [ { range: { DOC_DATE.v: { from: 2014-08-19T20:00:00.000-04:00, to: 2014-10-18T23:59:59.999Z, include_lower: true, include_upper: true } } } ] } } } The behavior I expect is the following : In the documents that are returned, they should contain both values for the NESTED_FIELD.v (AAPL.OQ and GOOG.OQ) that satisfy the condition where their corresponding NESTED_FIELD.v range also is satisfied. The behavior I see : the documents returned contain either one of the values (as in its got AAPL.OQ (OR) GOOG.OQ (OR) Both. I want documents that only have both the values. So the operator :and (and its variant operator:AND) does not seem to have any effect. any pointers suggestions regarding this is much
Re: If I use EC2 Discovery Plugin do I necessarily give internet access to my instances?
I have the same problem yesterday. What I did is make elastic IP and associate it with your ec2 instance. In the sercuity group you need open both private Ip and the elastic IP. try it. On Wednesday, November 19, 2014 8:01:48 AM UTC-5, David Vasquez wrote: Hi everyone! I'm trying to configure tight security rules to my elasticsearch cluster meaning that the network access rules must be exactly what is needed. Now I've found that the EC2 Discovery plugin does a call to AWS ( ec2.us-east-1.amazonaws.com:443) and for that I would need to give internet access to my elasticsearch instances. That said, it means a big drawback for my security configuration because I cannot tie the call to a fixed IP, neither to a fixed port and hence my access rules would be wide open. Can you please tell me how do you manage this security issue on AWS? Thank you very much! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/17504959-fd11-4b16-ab3f-640a083c1b19%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Deleted indices keep coming back w/ 1.4.0
Hi, Since we upgraded to 1.4.0, deleted indices in our time-series index set keep coming back right after deletion. So whenever we drop an expired index (usually as midnight rolls), it gets deleted and removed from the alias it was under. But about half the time it comes back as an empty index. As you can see from the marvel screenshot below (read from the bottom to the top). https://lh4.googleusercontent.com/-iXhabN33WIw/VG4GDf3Q8QI/AC4/jLF_dGBpGIg/s1600/Screen%2BShot%2B2014-11-20%2Bat%2B10.07.58%2BAM.png Just wanted to make sure you guys are aware of this bug. D. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0ca4974-1940-43ea-bdfc-a7b8bdd80162%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
ES seems to be aliasing the byte type to the short type
Hi everyone, If was experimenting on mappings for index size optimization purpose and I have an issue, it seems a bug to me, I cannot find any documentaion about it. When I declare a field of type *byte *ES seems to be considering it as *short*, for proof see the error message of the last curl below, it mentions the short type even though I declared a byte (*MapperParsingException[failed to parse [some_data]]; nested: JsonParseException[Numeric value (32768) out of range of Java short*) *Every has been tested on a freshly untared ES.* *# Create the index* curl -XPUT 'http://localhost:9200/some_index?pretty' -d ' { mappings: { some_type: { dynamic: strict, properties: { some_data: { type: byte } } } } } ' *# Insert a doc with a value just out of the range of the byte type, success, wierd* curl -XPUT http://localhost:9200/some_index/some_type/1?pretty; -d ' { some_data: 256 } ' *# Insert a doc with the max value for the short type, success, still wierd* curl -XPUT http://localhost:9200/some_index/some_type/1?pretty; -d ' { some_data: 32767 } ' *# Insert a doc with a value just out of the range of the short type, failure, ok I get it, ES sees it as a short...* curl -XPUT http://localhost:9200/some_index/some_type/1?pretty; -d ' { some_data: 32768 } ' *java -version* outputs : java version 1.7.0_07 Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *lsb_release -a* outputs : Distributor ID:Ubuntu Description:Ubuntu 12.04.5 LTS Release:12.04 Codename:precise *uname -r* outputs: 3.1.10-1.9-ec2 *ES info* : ES 1.4.0 *Thanks in advance for the help.* Damien -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2af44e04-e495-4641-a275-348d6ce73d5b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: upgrading from 0.90.7 to 1.4. Gotchas?
I can't remember what 0.90.x was unlike as that was long ago for us, but we recently upgraded from 1.1.0 to 1.4.0. Look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/breaking-changes.html additionally pay attention to: - scripting: - replacement of mvel w/ groovy and disabling dynamic scripting by default. We elected to install the mvel plugin manually and change our scripts to identify that they are mvel (lang=mvel) and make some minor adjustments to make compatible (such as use of _score instead of doc.score in scripts). We will do the upgrade to groovy from mvel separately to take care of security concerns w/ mvel. - lot of percolator changes in 1.x - multi field changes in 1.0.0 - disk space allocation decider configuration format changed in 1.x sometime (if you're configuring that) - enabled CORS if you're using HEAD (see https://github.com/mobz/elasticsearch-head/issues/170) In general, I would go through release notes at http://www.elasticsearch.org/downloads/ and look under breaking changes for every version since your last version. On Thursday, November 20, 2014 7:47:04 AM UTC-5, Jason Wee wrote: I would be interested too, we are using the same 0.90.7 version. Jason On Thu, Nov 20, 2014 at 2:03 PM, Yves Dorfsman yv...@zioup.com javascript: wrote: Are there any precautions to take before upgrading from 0.9 to 1.4? Different data types? Different API calls? etc... And, what is the best way to upgrade? Can we just add a node at the newer version and let it pull the data? Thanks. http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5070f24d-92d6-42ba-a447-4dd759c59fb3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: upgrading from 0.90.7 to 1.4. Gotchas?
The most surprising part of my upgrade from 0.90 to 1.0.1 was the drop of indexing performance. So, yes, I’m also interested to know any gotchas. 2014年11月20日 下午8:47于 Jason Wee peich...@gmail.com写道: I would be interested too, we are using the same 0.90.7 version. Jason On Thu, Nov 20, 2014 at 2:03 PM, Yves Dorfsman y...@zioup.com wrote: Are there any precautions to take before upgrading from 0.9 to 1.4? Different data types? Different API calls? etc... And, what is the best way to upgrade? Can we just add a node at the newer version and let it pull the data? Thanks. http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itzYRinoR1-qW2FRXQT6bxTpuPADoW6zTsJ%3DgKLoGmZBKA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHO4itzYRinoR1-qW2FRXQT6bxTpuPADoW6zTsJ%3DgKLoGmZBKA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ1W5Rw0QqYtiGExJZUNSJ8QsOAxmxYSDgk-6b_zK%3DkmJA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: upgrading from 0.90.7 to 1.4. Gotchas?
Also, forgot to mention... if you have native scripts, they will mysteriously throw Unsupported Operation exception whenever invoked. Looks like they made a mistake in 1.4.0 (that is now reverted on master), that requires you to override the setScorer in native scripts. It's ok, I just wish they documented in breaking changes. On Thursday, November 20, 2014 1:03:17 AM UTC-5, Yves Dorfsman wrote: Are there any precautions to take before upgrading from 0.9 to 1.4? Different data types? Different API calls? etc... And, what is the best way to upgrade? Can we just add a node at the newer version and let it pull the data? Thanks. http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/743f6571-af44-40cc-a1c8-f9474cae4773%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
ElasticSearch 1.3.4 - Duplicate data sometimes
Hi, I was wondering how I might be able to trouble shoot issues with duplicate data coming back from queries. In my query, I perform an aggregate query, something like this: final SearchResponse searchResponse = client() .prepareSearch(indexName) .setTypes(OBJ_TYPE) .setFetchSource(true) .setExplain(true) .addSort(dateCreated.value, SortOrder.DESC) .addSort(recId, SortOrder.DESC) .setSize(1000) .addAggregation( AggregationBuilders.filter(filter1).filter(filterBuilder).subAggregation(rangeBuilder)) .execute().actionGet(); The values returned from this query are giving back duplicate object id's. But only sometimes. I've looked at our elasticsearch config files and don't see any way this could happen. The filters are only reducing based on some attributes, I can't think of any reason this could occur. John -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/11b5f5a9-95da-40c7-aad3-00b5e79d74ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES backups without using snapshots?
I have never used plugins, but there is also Jorg's tool: https://github.com/jprante/elasticsearch-knapsack -- Ivan On Wed, Nov 19, 2014 at 11:27 PM, Mathew D mathew.degerh...@gmail.com wrote: Hi Ivan, Thanks for the quick response. We've got 5 shards per index, so with 2 replicas each node should in theory have a full set of data. I was hoping that taking the node out of service by stopping it would avoid disruption as a result of pausing indexing, but I couldn't find any documentation to confirm if such an operation would leave the data files in a consistent state that could reliably be used for restore. Evan's suggestion of elasticdump looks like the closest to what I'm after, although unfortunately I don't have node.js/npm installed (and being an enterprise could be tricky to get installed). NB I hear your concerns re cluster design. Incorporating the remote node was chosen to minimise data loss following a data centre failure, however because of the risk of split brain, the node actually functions more of a warm DR than any sort of HA... Regards, Mat On Thursday, November 20, 2014 2:32:14 PM UTC+13, Ivan Brusic wrote: How many shards for each index? I am assuming that each node does not have all the data. If you can stop indexing, you can just rsync the data to a local directory. Make sure you execute a flush and preferably an optimize in order to merge the segments on disk. The trick part is the manual combine you referred to. BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :) Cheers, Ivan On Wed, Nov 19, 2014 at 7:41 PM, Mathew D mathew.d...@gmail.com wrote: Hi there, Any suggestions as to how I can create full ES backups without using snapshot functionality? The reason I can't use snapshots is because they require a shared directory mounted on all nodes, but my 3-node cluster spans two data centres and I am not able to NFS mount over the WAN. I'm also not permitted to backup to AWS/S3. As I have 2 replicas of each index, I'm leaning towards the idea of stopping one node and backing up that node's data directory but wondered if anyone could suggest a more elegant way. For example, could I snapshot to a local directory on each node, then manually combine the contents into a single cohesive backup? Regards, Mat -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfzN%2BbpvL94TbYMHNr0L4x%2BjEA0D6NrM_Hyj8NjUEHmA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
min_document_doc in nested aggregations
Hi, I have several aggregations each of which have their own inner aggregations. It seems that the 'min_document_doc' does not apply when their containing aggregation is itself empty. I presumed that because both level of aggregations use 'min_document_doc' there would be buckets for the inner agg as well. Can somebody enlighten me on why ES cannot do this, sort of a technical insight would be appreciated. Thanks. Here is a snippet of my query: ... aggregations: { totalCount: { global: {} }, categories-missing: { terms: { field: categories.missing, size: 0, min_doc_count: 0, order: { _term: asc } } }, datasetId: { terms: { field: datasetId, size: 0, min_doc_count: 0, order: { _term: asc } }, aggregations: { attributes-Default: { terms: { field: attributes.Default, size: 0, min_doc_count: 0, order: { _term: asc } } }, attributes-Administrative_information: { terms: { field: attributes.Administrative_information, size: 0, min_doc_count: 0, order: { _term: asc } } }, ... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d00b5831-351e-4f36-9197-9d56928667eb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: If I use EC2 Discovery Plugin do I necessarily give internet access to my instances?
Yes..but this might not be an option if your instance is in a private subnet...it also means handling all your IPS like this ( though in theory you don't need internal IPs, security group id/name would do as well...) - there r limits to how many rules you can add to a secgroup At the same time, adding eip would complicate the OP's apparent sec requirements ... On 20/11/2014 12:04 pm, wellszh...@xteros.com wrote: I have the same problem yesterday. What I did is make elastic IP and associate it with your ec2 instance. In the sercuity group you need open both private Ip and the elastic IP. try it. On Wednesday, November 19, 2014 8:01:48 AM UTC-5, David Vasquez wrote: Hi everyone! I'm trying to configure tight security rules to my elasticsearch cluster meaning that the network access rules must be exactly what is needed. Now I've found that the EC2 Discovery plugin does a call to AWS ( ec2.us-east-1.amazonaws.com:443) and for that I would need to give internet access to my elasticsearch instances. That said, it means a big drawback for my security configuration because I cannot tie the call to a fixed IP, neither to a fixed port and hence my access rules would be wide open. Can you please tell me how do you manage this security issue on AWS? Thank you very much! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/17504959-fd11-4b16-ab3f-640a083c1b19%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/17504959-fd11-4b16-ab3f-640a083c1b19%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4%2BripJ%3DmDUgH8VbXMAvFEQwGAbqWSwwS-Nm0TEeyUpOtw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Changing Analyzer behavior for hyphens - suggestions?
The whitespace tokenizer has the problem that punctuation is not ignored. I find the word_delimiter filter not working at all with whitespace, only with keyword tokenizer, with massive pattern matching which is complex and expensive :( Therefore I took the classic tokenizer and generalized the hyphen rules in the grammar. The tokenizer hyphen and filter hyphen are two routines. The tokenizer hyphen keeps hyphenated words together and handles punctuation correct. The filter hyphen adds combinations to the original form. Main point is to add combinations of dehyphenated forms so they can be searched. Single words are only taken into account when the word is positioned at the edge. For example, the phrase der-die-das should be indexed in the following forms: der-die-das, derdiedas, das, derdie, derdie-das, die-das, der Jörg On Thu, Nov 20, 2014 at 9:29 AM, horst knete baduncl...@hotmail.de wrote: So the term this-is-a-test get tokenized into this-is-a-test which is nice behaviour, but in order to make an full-text-search on this field it should get tokenized into this-is-a-test, this, is, a and test as i wrote before. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEveN15MGdB-2fKAx46bntZ8VO8ii88BNxDkfo6W5jPMw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Getting file text content from mapper?
Also, this is the first line of what's posted along the river { index: {_index:resumes,_type:resume,_id:2158912}} Things can get truncated when they're as big as a Base64 encoded file :) On Wednesday, November 19, 2014 6:01:29 PM UTC-5, Raymond Giorgi wrote: Hey all, I'm hoping someone can help me out with something I'm having an issue with. The short: I'm trying to extract plaintext from the attachment-mapper. The long: I'm posting the contents of a file Base64 encoded to RabbitMQ which is feeding an ElasticSearch river plugin. Querying against the field works fine, but it only seems to store the Base64 encoding of the file instead of the plaintext. I'd like to extract the contents as plaintext and have that be returnable (i.e. query for the text of a docx). I'm feeding it from a PHP front end, so there are places in the app where I'd like to rely on Elasticsearch's built in Tika processor. Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Marvel / ES query document count major discrepancy
Howdy, I have been hitting my ES cluster pretty hard of recent and I think it is holding up great. In the last few days, I have noticed a major discrepancy in the document count that Marvel shows versus that of doing a _count query of the actual ES cluster. Marvel is reporting about 43.9M documents while the ES query shows 8.7M. Where would this discrepancy come from? I would suspect it is a monitor error on Marvels part, but I'm not sure. Any ideas? Marvel Screenshot: https://www.dropbox.com/s/1y39wui96fpjc14/Screenshot%202014-11-20%2009.57.42.png?dl=0 ES Query: http://x/pa-2014-11-19/_count { - count: 8781919, - _shards: { - total: 5, - successful: 5, - failed: 0 } } Thanks, Mike -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/071a26ac-db82-42dc-880b-6165a0d38d30%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Native script unable to get values, perhaps because it's a child doc? ES v1.1.1
Hello I have a native script that I'm using to score/sort queries and it is not working properly for one of my three types. All three types have the same nested field, and I'm using the script to check values and score/sort by an externally defined order. However, for one of the three types the values pulled from the doc fields are always zero/null (using docFieldLongs(fieldName).getValue() or docFieldStrings(stringValue).getValue()). I can check for the fields to be present using doc().containsKey(), and it seems to see them, but it never actually sees any values. I've pulled a few records manually and verified that the data looks good. The only think I can think of that's different is that this one type is a child of one of the other two, but I'm querying it completely independently of the parent in this case. Does this sound familiar to anyone by any chance? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fb79559-7d9f-4a32-a8e1-743876c9a152%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Uncertain field types when extracting fields from getSource() (java api)
A workaround is to cast the value into Number and then to call Number#longValue(). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff2f3a76-f4be-4669-8fdc-c20b2efb1c6e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Deleted indices keep coming back w/ 1.4.0
That's unlikely to be a bug, the only time ES will recreate an index is if it finds dangling data. Are your indexes created automatically? How is your data sent to ES? Is it possible that there is some data that is slower reaching ES than others and so the time difference causes this to happen? On 21 November 2014 02:19, David Smith davidksmit...@gmail.com wrote: Hi, Since we upgraded to 1.4.0, deleted indices in our time-series index set keep coming back right after deletion. So whenever we drop an expired index (usually as midnight rolls), it gets deleted and removed from the alias it was under. But about half the time it comes back as an empty index. As you can see from the marvel screenshot below (read from the bottom to the top). https://lh4.googleusercontent.com/-iXhabN33WIw/VG4GDf3Q8QI/AC4/jLF_dGBpGIg/s1600/Screen%2BShot%2B2014-11-20%2Bat%2B10.07.58%2BAM.png Just wanted to make sure you guys are aware of this bug. D. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0ca4974-1940-43ea-bdfc-a7b8bdd80162%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f0ca4974-1940-43ea-bdfc-a7b8bdd80162%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmdhCEHMuCMowC2avFKMrmau0Zm%2BWsJgEQMLSPH8JSQXw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is Elasticsearch also supported on AIX and HP Itanium 11.31
Depends what you mean by supported. I have seen comments of people running it on AIX, but I don't think it is officially supported. On 21 November 2014 00:37, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Is Elasticsearch also supported on AIX and HP Itanium 11.31. I didn't find this information in release notes or installation instructions. Thanks Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3Lh3GMkBM6sGs%2B9oR0A3nvTO5tU65ZHEa-Sg7ubBbL_7g%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALZAj3Lh3GMkBM6sGs%2B9oR0A3nvTO5tU65ZHEa-Sg7ubBbL_7g%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZk96y9b4VcsES5UDqQSwoVtmjr6%2BS6dN5XMrwpCxzzwag%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
[ANN] it’s {on}: announcing our first user conference – elastic{on}15
http://www.elasticsearch.org/blog/its-on-announcing-our-first-user-conference-elasticon15 Shay Banon November 20, 2014 It’s been a little over two years since we formed a company around Elasticsearch, and the engagement with our community, users, and customers has taken on a life of its own. There are now 90 meetup groups http://elasticsearch.meetup.com/ around the globe, hundreds of conferences featuring our products, and a growing list of events where our own developers engage audiences in the Elasticsearch story. It’s clear we hit a nerve. Over and over, we kept hearing one question: “When will Elasticsearch get a conference of its own?” We listened, and I am happy to announce the first Elasticsearch conference, Elastic{ON}15 http://www.elasticon.com/, is happening March 9 through 11, 2015 in San Francisco, California. The conference details are unfolding as we speak, but there are a few things we already have planned that I want to share with you. First, Elastic{ON}15 will be centered around Elasticsearch http://www.elasticsearch.com/products/elasticsearch/ and the ecosystem of products surrounding it, including Apache Lucene, Kibana http://www.elasticsearch.com/products/kibana/, Logstash http://www.elasticsearch.com/products/logstash/, the various client libraries http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/clients.html , Elasticsearch for Apache Hadoop http://www.elasticsearch.com/products/hadoop/,Marvel http://www.elasticsearch.com/products/marvel/, and Shield http://www.elasticsearch.com/products/shield/. Part of what makes Elasticsearch tick is the close communication we have with our users. To that extent, we’re doing a few things to make sure the conference is run the same way. What does this mean for you? It means that *all* the developers at our company (that’s right, every single one of them) will be attending the conference — and they want to hear from you. Elastic{ON}15 will feature a dedicated track that gives you a unique opportunity to talk with our engineers about all the work they currently do and plan to do. Afterwards, we’re coordinating an Elasticsearch dev all hands meeting where we’ll discuss your feedback and apply it to future products and events. The second aspect of the conference is hearing you, the user, speak about how you use our platform. I am lucky enough to be able to travel the world and talk to users and customers frequently, and am continuously amazed by how our products are being put to use. We plan to create a platform for our users, customers, and contributors in the community to talk about their use cases and successes. Elastic{ON}15 will be a great way to meet and talk with other users in your space and share knowledge. Please, if you’re interested, don’t hesitate to submit to speak http://www.elasticon.com/apex/Elastic_ON_Speak at the conference. We will also have a hands-on track with our developers, who will go through some high-level overviews and technical deep dives of our various products. Or you can drop by our “Agents of Elasticsearch” station to ask any questions that are on your mind. Bottom line: Elastic{ON}15 is all about you! And obviously, we plan to have a lot of fun while we’re together. I am super excited about the conference, and I hope you are as well. I would love to personally welcome each and every one of you to join us; it’s going to be great. (And make sure to sign up http://www.elasticon.com/apex/Elastic_ON_Signup to save your spot – my events team keeps reminding me that registration will fill up fast!) (Sent on behalf of the Elasticsearch Team) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZkEbfbwX8CJowcqL9XyX_bT2TC-mn5DSw0KOvVjSCuOrg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
query timing out
Hi all, hoping to get some help with this. I am trying to retrieve the latest tweet by a person. I'm using the javascript library. Using the elastic.js library to help build a query. Here is the query generated: {query: {match: {talent_id:{query:546e50b989fe347230c4}}}, sort:[{post_date:{order:desc}}],size:1} When I run this query through curl it works just fine, but when I run it through the elasticsearch JS lib it times out regularly (not always but a lot). The curl one comes back almost immediately which I would expect. Any thoughts on why the JS lib times out? Or a better way to write my query above to get what I want? Thank you in advance! -warner -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJNTuMBj%3DwoYzBb1D5CAv9n6LgRm-n_am3yafT99-2tNhWRAZg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Native script unable to get values, perhaps because it's a child doc? ES v1.1.1
Hi, did you index the field you want to use in the native script? Shiwen On Thursday, 20 November 2014 11:38:30 UTC-8, Jonathan Foy wrote: Hello I have a native script that I'm using to score/sort queries and it is not working properly for one of my three types. All three types have the same nested field, and I'm using the script to check values and score/sort by an externally defined order. However, for one of the three types the values pulled from the doc fields are always zero/null (using docFieldLongs(fieldName).getValue() or docFieldStrings(stringValue).getValue()). I can check for the fields to be present using doc().containsKey(), and it seems to see them, but it never actually sees any values. I've pulled a few records manually and verified that the data looks good. The only think I can think of that's different is that this one type is a child of one of the other two, but I'm querying it completely independently of the parent in this case. Does this sound familiar to anyone by any chance? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b01edba-1d36-427d-abfe-48a03f4104a3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Odd behavior of bulk loading speed - good riddle?
So this has me perplexed. I have a bulk data loading job that creates an upsert statement and batches 500 of them in a bulk operation using the _bulk interface. I send the bulk insert via HTTP (on 9200) and wait for the response before sending the next one, which I do immediately. I do not hit any thread pool limits. I have replicas set to zero and refresh interval set to -1 to make the loading as lightweight as possible. Timing these, they start out pretty fast and run about 2000 documents per second. Four or so HTTP round trips. This lasts for a few minutes and then it starts to slow. Within an hour, it's running about 1200 per second. In another hour, it's down to about 600 per second. Then it seems to flatten-out about 400 per second until the job is done, some 8 million documents later. So my question is - why the slowdown? It's very consistent, seems reasonably linear, and happens 100% of the time. Any clues? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a787d461-f467-4f79-943b-e65e12492783%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Odd behavior of bulk loading speed - good riddle?
The statement, if that helps (this is a line of PHP, hence the $ variables): {\script\ : \ctx._source.auctionid=$auctionID; ctx._source.auctiontype=$auctionType; ctx._source.auctionstatus=$auctionStatus; ctx._source.auctionprice=$auctionPrice; ctx._source.auctionendtime='$auctionEndTime'; ctx._source.auctionadult=$adultListingFlag;\, \upsert\: { \auctionid\: $auctionID, \auctiontype\: $auctionType, \auctionstatus\: $auctionStatus, \auctionprice\: $auctionPrice, \auctionendtime\: \$auctionEndTime\, \auctionadult\: $adultListingFlag, \domaintype\: \auction\, \fqdn\: \$fqdn\, \sld\: \$sld\, \tld\: \$tld\, \vendorid\: 6, \price\: 0, \commissionrate\: 0, \isfasttransfer\: false, \isadult\: $aFlag, \istaboo\: $tFlag, \sldlen\: $sldlen, \numhyphens\: $numhyphens, \numdigits\: $numdigits, \tokens\: . (($tokens == null) ? '' : json_encode($tokens)) . }} Creates a document if it doesn't exist, updates it if it does. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4173f9b5-1d46-49a8-9647-c01618ee97e9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Increased query count after moving to nested documents
We have always indexed nested documents, but never fully used them since issue 3022 is still outstanding. Finally made the move to actually filtering documents at the nested level. Tracking metrics with graphite/grafana, I noticed immediately that the active/current query count is much higher although the actual volume of queries has not changed. The overall query count is normal. Is using join queries increasing the number of queries reported? Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAnkdygA8g2uN0f3DVJyKcGcObrykVFEZZn6uUgVbxXjg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Native script unable to get values, perhaps because it's a child doc? ES v1.1.1
Yep, and I can search on it in other queries. After testing most of the afternoon, I finally seem to have gotten it to work by pulling the field using the full name, including the nested path: Long value = docFieldLongs(nestedPath.propertyName).getValue(); This seems to work in all three places, including the two types that also worked without the nested path, which is good. Afterwords I came across this SO post http://stackoverflow.com/questions/21289149/trouble-with-has-parent-query-containing-scripted-function-score?rq=1 which sounds like a similar problem, though I'm not in a has_parent/has_child query (though I AM in a child type). Sounds like it may be a bug. On Thursday, November 20, 2014 4:00:56 PM UTC-5, Shiwen Cheng wrote: Hi, did you index the field you want to use in the native script? Shiwen On Thursday, 20 November 2014 11:38:30 UTC-8, Jonathan Foy wrote: Hello I have a native script that I'm using to score/sort queries and it is not working properly for one of my three types. All three types have the same nested field, and I'm using the script to check values and score/sort by an externally defined order. However, for one of the three types the values pulled from the doc fields are always zero/null (using docFieldLongs(fieldName).getValue() or docFieldStrings(stringValue).getValue()). I can check for the fields to be present using doc().containsKey(), and it seems to see them, but it never actually sees any values. I've pulled a few records manually and verified that the data looks good. The only think I can think of that's different is that this one type is a child of one of the other two, but I'm querying it completely independently of the parent in this case. Does this sound familiar to anyone by any chance? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42ed3e0c-e5a9-4673-afb6-9a65213b91c1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Documentation for internals and architecture of Elasticsearch
Look at the videos from Berlin Buzzwords 2011 and 2012 http://www.elasticsearch.org/videos/page/3/ They are a great intro Jörg On Thu, Nov 20, 2014 at 6:13 AM, Rahul Khengare rahulk1...@gmail.com wrote: Hi All, When we provides documents or data objects to Elasticsearch using REST APIs. Elasticsearch store the data to local store or any node in ES cluster. I want to understand how elasticsearch store the data internally. Is there any documentation available on architecture and storing mechanism. Thanks in advance. Regards, Rahul Khengare -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c23adb7f-a1fd-481f-a2a7-0a88273033ab%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c23adb7f-a1fd-481f-a2a7-0a88273033ab%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFYsrP8EsCNG9vf7kXtARrLikSuOm0%2BUj-LiHsXoXwzKQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Getting file text content from mapper?
So that’s the expected behavior. Mapper attachment only index the content but never modify the _source document.. If you want to see extracted text, you need to store the field and explicitly ask for it at query time using fields option. Have a look here: https://github.com/elasticsearch/elasticsearch-mapper-attachments#highlighting-attachments https://github.com/elasticsearch/elasticsearch-mapper-attachments#highlighting-attachments -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 20 nov. 2014 à 20:14, Raymond Giorgi raymondgio...@gmail.com a écrit : Also, this is the first line of what's posted along the river { index: {_index:resumes,_type:resume,_id:2158912}} Things can get truncated when they're as big as a Base64 encoded file :) On Wednesday, November 19, 2014 6:01:29 PM UTC-5, Raymond Giorgi wrote: Hey all, I'm hoping someone can help me out with something I'm having an issue with. The short: I'm trying to extract plaintext from the attachment-mapper. The long: I'm posting the contents of a file Base64 encoded to RabbitMQ which is feeding an ElasticSearch river plugin. Querying against the field works fine, but it only seems to store the Base64 encoding of the file instead of the plaintext. I'd like to extract the contents as plaintext and have that be returnable (i.e. query for the text of a docx). I'm feeding it from a PHP front end, so there are places in the app where I'd like to rely on Elasticsearch's built in Tika processor. Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8A848658-E1A7-4192-B66C-104D664C7A66%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
elasticsearch JAVA version and JDK version?
Hi all, I am new to elasticsearch java API and have some questions? 1) What is the minimum JDK to be used with elasticsearch java API version 1.4.0? 2) Is there a version of elasticsearch java API that works with JDK 1.6.0? Thank you! Thong -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14fc7889-2cf9-4ba5-a536-710b21439adb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch JAVA version and JDK version?
1.4.X is 1.7u55 or 1.8u20 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html#jvm-version You'd have to dig back through the older versions of the docs to find what is supported with Java 1.6.0, but I know 0.90.X was. On 21 November 2014 11:17, Thong Bui t...@rhapsody.com wrote: Hi all, I am new to elasticsearch java API and have some questions? 1) What is the minimum JDK to be used with elasticsearch java API version 1.4.0? 2) Is there a version of elasticsearch java API that works with JDK 1.6.0? Thank you! Thong -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14fc7889-2cf9-4ba5-a536-710b21439adb%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/14fc7889-2cf9-4ba5-a536-710b21439adb%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZ%3DdXXuHVPWtsxvCV_TuUnYdsK8Lhg9Em0kY%2Be44dFS99A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Bool and And filter, which is faster?
In this article http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/, it's saying bool is faster than add/or filters. But at that time it's elasticsearch 0.9. Is this still the truth? Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69acf937-ea05-4bf0-b3a6-f469644f842d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
RE: 1.4.0 data node can't join existing 1.3.4 cluster
FYI, I have found a solution that works (at least for me). I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve done is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I added a line to point the nodes via unicast explicitly to the current master: discovery.zen.ping.unicast.hosts: [10.210.9.224:9300] When I restarted elasticsearch with that setting, with cloud-aws installed and configured on version 2.4.0, the new nodes found the cluster and properly joined it. I will now start nuking the old v1.3.5 nodes to migrate the data off of them. Before the final 1.3.5 node is nuked, I will change the config on one of the v1.4.0 nodes to allow it as master and restart it. I’m not sure if the master stuff is needed or not, but I was very afraid of a split-brain problem. I have another 4-node testing cluster that I will be able to try this upgrade again with in a more controlled manner. I’m NOT looking forward to upgrading our current production cluster this way (15 data-only nodes, 3 master-only nodes). So it would appear that the problem is somewhere in the unicast discovery code. The question is who’s to blame? Elasticsearch or the cloud-aws plugin? From: Boaz Leskes [mailto:b.les...@gmail.com] Sent: Wednesday, November 19, 2014 2:27 PM To: elasticsearch@googlegroups.com Cc: Christian Hedegaard Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster Hi Christian, I'm not sure what thread you refer to exactly, but this shouldn't happen. Can you describe the problem you have some more? Anything in the nodes? (both the 1.4 node and the master) Cheers, Boaz On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard wrote: I found this thread while trying to research the same issue and it looks like there is currently no resolution. We like to keep up on our elasticsearch upgrades as often as possible and do rolling upgrades to keep our clusters up. When testing I’m having the same issue, I cannot add a 1.4.0 box to the existing 1.3.4 cluster. Is there a fix for this anticipated? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com. For more options, visit https://groups.google.com/d/optout.
Why ES node starts recovering all the data from other nodes after reboot?
I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back online. All the hardware and file system content is intact. As soon as ES service starts on that machine, it assumes that there is no usable data locally and recovers as much data as it deems necessary for balancing from other nodes. This behavior puzzles me, because most of the data shards stored on that machine file system can be reused as they are. Cluster stores logs, so all indices except those for the current day never ever change until they get deleted. Can't ES node detect that it has perfect copies of some (actually most) of the shards and instead of copying them over just mark them as up to date? I suspect I don't know about some step to enable this behavior and I'm looking to enable it. Any advice? Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Why ES node starts recovering all the data from other nodes after reboot?
You should disable allocation before you reboot, that will save a lot of shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote: I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back online. All the hardware and file system content is intact. As soon as ES service starts on that machine, it assumes that there is no usable data locally and recovers as much data as it deems necessary for balancing from other nodes. This behavior puzzles me, because most of the data shards stored on that machine file system can be reused as they are. Cluster stores logs, so all indices except those for the current day never ever change until they get deleted. Can't ES node detect that it has perfect copies of some (actually most) of the shards and instead of copying them over just mark them as up to date? I suspect I don't know about some step to enable this behavior and I'm looking to enable it. Any advice? Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmc_rMFzRUUrJSMJ9bY16tz-dZ8eSeUZobC7XaxWZTRPg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Root type mapping not empty after parsing
Hi I am trying to upgrade from ES 0.90.2 to 1.4.0 I am using java api to set the settings of index, this was working fine with 0.90.2 client.admin().indices().prepareCreate(indexName).setSettings(_).execute().actionGet() here are my settings: { index: { analysis: { analyzer: { keyword_lowercase: { type: custom, tokenizer: keyword, filter: lowercase }, standard_lowercase: { type: custom, tokenizer: standard, filter: lowercase } } } } } Getting Root type mapping not empty after parsing error now.. Can you please suggest any solutions? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5c74a16-0ba3-436d-8e6f-87c52d1f31e2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Why ES node starts recovering all the data from other nodes after reboot?
If you do disable allocation before you reboot a node and a client writes to a shard that had a replica on that node, does the entire replica gets copied when the node come up? Or does it get just updated? On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote: You should disable allocation before you reboot, that will save a lot of shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com javascript: wrote: I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back online. All the hardware and file system content is intact. As soon as ES service starts on that machine, it assumes that there is no usable data locally and recovers as much data as it deems necessary for balancing from other nodes. This behavior puzzles me, because most of the data shards stored on that machine file system can be reused as they are. Cluster stores logs, so all indices except those for the current day never ever change until they get deleted. Can't ES node detect that it has perfect copies of some (actually most) of the shards and instead of copying them over just mark them as up to date? I suspect I don't know about some step to enable this behavior and I'm looking to enable it. Any advice? Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: priming data for a new node
So if a shard has been updated since the data copy, will it copy the entire shard, or just update it? On Wednesday, 19 November 2014 23:34:01 UTC-7, Mark Walkom wrote: It doesn't copy everything, only what it needs to balance the shards. On 20 November 2014 17:20, Yves Dorfsman yv...@zioup.com javascript: wrote: When adding a new node to a cluster, is there a way to prevent it from having to copy all the data from the other nodes? We tried to copy the data on disk from an existing node (one that had all the data for the given indices), but it still copied everything. Is there a way to make it update what is new only? Thanks. -- http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7b30007-972b-40cb-a5b0-5eb1c1b738c5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Why ES node starts recovering all the data from other nodes after reboot?
It will enter recovery where it syncs at the segment level from the current primary, then the translog gets shipped over and (re)played, which brings it all up to date. On 21 November 2014 14:51, Yves Dorfsman y...@zioup.com wrote: If you do disable allocation before you reboot a node and a client writes to a shard that had a replica on that node, does the entire replica gets copied when the node come up? Or does it get just updated? On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote: You should disable allocation before you reboot, that will save a lot of shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/setup-upgrade.html#rolling-upgrades On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote: I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back online. All the hardware and file system content is intact. As soon as ES service starts on that machine, it assumes that there is no usable data locally and recovers as much data as it deems necessary for balancing from other nodes. This behavior puzzles me, because most of the data shards stored on that machine file system can be reused as they are. Cluster stores logs, so all indices except those for the current day never ever change until they get deleted. Can't ES node detect that it has perfect copies of some (actually most) of the shards and instead of copying them over just mark them as up to date? I suspect I don't know about some step to enable this behavior and I'm looking to enable it. Any advice? Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Why ES node starts recovering all the data from other nodes after reboot?
The thing is that this is a disk level operation. It pretty much rsyncs the files from the current master shard to the node when it comes back online. This would be OK if the replica shards matched the master but that is only normally the case if the shard was moved to the node after it was mostly complete and then you've had only a few writes. Normally shards don't match each other because the way the index is maintained is nondeterministic. The translog replay is only used as a catch up after the rsync-like step. This is something that is being worked on. Its certainly my biggest complaint about elasticsearch but I'm confident that it'll get better. Nik On Nov 20, 2014 11:11 PM, Mark Walkom markwal...@gmail.com wrote: It will enter recovery where it syncs at the segment level from the current primary, then the translog gets shipped over and (re)played, which brings it all up to date. On 21 November 2014 14:51, Yves Dorfsman y...@zioup.com wrote: If you do disable allocation before you reboot a node and a client writes to a shard that had a replica on that node, does the entire replica gets copied when the node come up? Or does it get just updated? On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote: You should disable allocation before you reboot, that will save a lot of shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/setup-upgrade.html#rolling-upgrades On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote: I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back online. All the hardware and file system content is intact. As soon as ES service starts on that machine, it assumes that there is no usable data locally and recovers as much data as it deems necessary for balancing from other nodes. This behavior puzzles me, because most of the data shards stored on that machine file system can be reused as they are. Cluster stores logs, so all indices except those for the current day never ever change until they get deleted. Can't ES node detect that it has perfect copies of some (actually most) of the shards and instead of copying them over just mark them as up to date? I suspect I don't know about some step to enable this behavior and I'm looking to enable it. Any advice? Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd09LRJk89wdHybYy48FMpCaYa1wTJ9HX9uX%2BjvNjvYq2g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
understaning terms syntax
Hi All Im having the following scenario (elasticsearch 1.0): the query query: { term: { ac: 3A822F3B-3ECF-4463-98F86DF6DE28EC5C } } yields no results but this works query: { query_string : { default_field : ac, query : 3A822F3B-3ECF-4463-98F86DF6DE28EC5C } } the problem is when I combine it with a must_not or not filter I still get the same results what is the correct syntax I need? Thanks GX -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a1f4c507-f85a-4ebd-b71f-4962259abf5a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.