Re: Problem with keeping in sync Elasticsearch across two data centers
Thanks so much everyone for sharing your thoughts! -Amit. On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu harii...@gmail.comwrote: I think with current ES version you have 3 options. - Use the great snapshot and restore feature to snapshot from a DC and restore in the other one - Index in both DC (so two distinct clusters) from a client level - Use Tribe node feature to search or index on multiple clusters Reference post https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote: Hi Amit, Ivan is correct. You might also check out I believe that you're looking for TribeNodes http://www.elasticsearch.org/guide/en/ elasticsearch/reference/master/modules-tribe.html and see if it fits your needs for cross-dc replication. --Mike On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote: Hello Michael - Understand that ES is not built to maintain consistent cluster state across data centers. what I am wondering is whether there is a way for ElasticSearch to continue to replicate data onto a different data center (with some delay of course) so that when the primary center fails, the fail over data center still has most of the data (may be except for the last few seconds/minutes/hours). Overall I am looking for a right way to implement cross data center deployment of elastic-search! -Amit. On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick michae...@serenesoftware. com wrote: Dario, I believe that you're looking for TribeNodes http://www. elasticsearch.org/guide/en/elasticsearch/reference/ master/modules-tribe.html ES is not built to consistently cluster across DC's / larger network lags. On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote: Hi, I've the following problem: our application publishes content to an Elasticsearch cluster. We use local data less node for querying elasticsearch then, so we don't use HTTP REST and the local nodes are the loadbalancer. Now they came with the requirement of having the cluster replicated to another data center too (and in the future maybe another too... ) for resilience. At the very beginning we thought of having one large cluster that goes across data centers (crazy). This solution has the following problems: - The cluster has the split-brain problem (!) - The client data less node will try to do requests across different data centers (is there a solution to this???). I can't find a way to avoid this. We don't want this to happen because of a) latency and b) firewalling issues. So we started to think that this solution is not really viable. So we thought of having one cluster per data center, which seems more sensible. But then here we have the problem that we must publish data to all clusters and, if one fails, we have no means of rolling back (unless we try to set up a complicated version based rollback system). I find this very complicated and hard to maintain, although can be somewhat doable. My biggest problem is that we have to keep the data centers in the same state at any time, so that if one goes down, we can readily switch to the other. Any ideas, or can you recommend some support to help use deal with this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d% 40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR% 2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-uosp9SWCCeZmkRdQHsSJTSndA% 40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat
Hey, You dont have by accident two lucene versions in your project, right? Would like to know more about that class cast exception, but this is the most verbose output you get I fear? --Alex On Tue, Feb 25, 2014 at 8:03 AM, Kevin J. Smith ke...@rootsmith.ca wrote: Hi, I am using elasticsearch embedded in a tomcat 7 webapp container (everything running under java 7.) All libs for elasticsearch are in WEB-INF/lib. In v0.90 everything is running swimmingly. We upgraded to v1.0 (libs and all and paid attention to breaking API calls) but now on Ubuntu Linux when I make a call to create an index via the following call: final CreateIndexResponse response = _client.admin().indices().prepareCreate(index).setSource(mapping).execute().actionGet(); I get the following exception: org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: Failed execution at org.elasticsearch.action.support.AdapterActionFuture.rethrowExecutionException(AdapterActionFuture.java:90) at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:50) at com.bitstew.search.SearchNode.createIndex(SearchNode.java:1507) at com.bitstew.search.SystemInit.loadIndexDefinition(SystemInit.java:206) at com.bitstew.search.SystemInit.loadIndex(SystemInit.java:81) at com.bitstew.search.SystemInit.loadIndices(SystemInit.java:52) at com.bitstew.ws.servlet.SystemAction.loadIndices(SystemAction.java:1798) at com.bitstew.ws.servlet.SystemAction.executeAction(SystemAction.java:383) at com.bitstew.ws.servlet.WebServicesDeployer.service(WebServicesDeployer.java:1888) at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:176) at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:145) at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:92) at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:381) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/codecs/PostingsFormat at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1701) at
When a Java process with ES Client terminate, does it automatically close the connection?
Hi all, A question on Java API for ElasticSearch. When a Java process with ES Client terminate, does it automatically close the connection? Or we should explicitly close the connection to save the resources? Best regards, Arinto -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27516684-8413-4e2c-9c44-021baf85b952%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: engine failure, message [OutOfMemoryError[unable to create new native thread]]
thanks for your response Jörg, somehow missed replying earlier. for some strange reason, the max threads setting was reset when i did a reboot.. so i had to set it back to a high number. On Tue, Feb 11, 2014 at 12:10 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: Your user ran out of thread/process space. This is reported as OOM in Java. You can check the nproc entry in /etc/security.d/limits.conf for maximum settings and compare this with the process table. The OS settings regarding threads are usually ok and should not be modified. Check if you have modified ES default settings regarding the thread pools, and revert this changes to the default settings. If this does not help, you should upgrade from 0.90.6 to 0.90.11 Jörg On Tue, Feb 11, 2014 at 6:45 AM, T Vinod Gupta tvi...@readypulse.comwrote: hi, i had a stable ES cluster on aws ec2 instances till a week ago.. and i don't know whats going on and my cluster keeps getting into a bad state every few hours. the error says OOM but i know that that is not the reason. the instance has enough heap space left. im running ES 0.90.6 version and giving half the ram (8gb) to ES process. and i see these messages (the same message kind of) in the logs on all the machines in the cluster. [2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer] [facebook_022014][1] sending failed shard for [facebook_022014][1], node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID [qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message [OutOfMemoryError[unable to create new native thread]]] any ideas on how to debug this or how to figure out whats causing this would be really helpful. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGnDrE-yZnyFgUbMks84KVsyB%3Dp_9UGvQ_DmUo5Diub0g%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHau4yuipzRgx-YmzEzjVJmyuE%3DXkTTZm08pv0-Yp5Lv5Xc97A%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
long GC pauses but only one 1 host in the cluster
im seeing this consistently happen on only 1 host in my cluster. the other hosts don't have this problem.. what could be the reason and whats the remedy? im running ES on a ec2 m1.xlarge host - 16GB ram on the machine and i allocate 8GB to ES. e.g. [2014-02-25 09:14:38,726][WARN ][monitor.jvm ] [Lunatica] [gc][ParNew][1188745][942327] duration [48.3s], collections [1]/[1.1m], total [48.3s]/[1d], memory [7.9gb]-[6.9gb]/[7.9gb], all_pools {[Code Cache] [14.5mb]-[14.5mb]/[48mb]}{[Par Eden Space] [15.7mb]-[14.7mb]/[66.5mb]}{[Par Survivor Space] [8.3mb]-[0b]/[8.3mb]}{[CMS Old Gen] [7.8gb]-[6.9gb]/[7.9gb]}{[CMS Perm Gen] [46.8mb]-[46.8mb]/[168mb]} thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHau4ysuCGHKbgf9WaJ224fHdk0FZuCGG%3DTykAookVNYeGOARQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat
Maybe there are two Elasticsearch jar versions in the class path. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE31P9BejcT341KN%3DOFVT8sJJKmC-%3DonhRQBhbFvz%3DLuQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: long GC pauses but only one 1 host in the cluster
Depends on a lot of things; java version, ES version, doc size and count, index size and count, number of nodes. What are you monitoring the cluster with as well? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 25 February 2014 20:21, T Vinod Gupta tvi...@readypulse.com wrote: im seeing this consistently happen on only 1 host in my cluster. the other hosts don't have this problem.. what could be the reason and whats the remedy? im running ES on a ec2 m1.xlarge host - 16GB ram on the machine and i allocate 8GB to ES. e.g. [2014-02-25 09:14:38,726][WARN ][monitor.jvm ] [Lunatica] [gc][ParNew][1188745][942327] duration [48.3s], collections [1]/[1.1m], total [48.3s]/[1d], memory [7.9gb]-[6.9gb]/[7.9gb], all_pools {[Code Cache] [14.5mb]-[14.5mb]/[48mb]}{[Par Eden Space] [15.7mb]-[14.7mb]/[66.5mb]}{[Par Survivor Space] [8.3mb]-[0b]/[8.3mb]}{[CMS Old Gen] [7.8gb]-[6.9gb]/[7.9gb]}{[CMS Perm Gen] [46.8mb]-[46.8mb]/[168mb]} thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHau4ysuCGHKbgf9WaJ224fHdk0FZuCGG%3DTykAookVNYeGOARQ%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajFysyP0qG7JtOVxdTuGO4H98C0bL0docZuuoQifpFvA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Is there a difference between indexing envelopes or polygons.
Hey thanks a lot ! Now it works just fine. I didn't see that coming, I thought ES was complaining if envelope's coordinates was reverted. My bad... Nicolas Le lundi 24 février 2014 15:58:24 UTC+1, Alexander Reelsen a écrit : Hey, if there is an error, can you please open a github issue? However the envelope shape expects you to set an upper left and lower right boundary. Your coordinates more look like lower left and upper right (meaning you might create quite a huge envelope acutally) - which obviously does not matter for a polygon --Alex On Sat, Feb 22, 2014 at 11:14 AM, Nicolas THOMASSON nico.th...@gmail.comjavascript: wrote: Hello, I'm new to ES. Please forgive me if I'm asking something stupid. Is there a fundamental difference between indexing an envelope or indexing a polygon ? For example if I define the area as a envelope { frame:{ type:envelope, coordinates: [[3,4],[1,2]] } } or as a polygon { frame:{ type:polygon, coordinates: [[[3,4],[3,2],[1,2],[1,4],[3,4]]] } } As in my comprehension they both define the same area, should I be able to perform the same queries whatever the way I defined the area ? (Currently I have a search query that returns wrong results on the envelope and seems to perform well on the polygon.) Thanks for your help, Nicolas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a2fa0fd8-f9a9-435b-9d34-e603c7242d2f%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01f3c770-60b3-494c-bac5-bba5a1ef673e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
fragment_size not used for simple queries
Hello, Using the highlight API for a simple query like this: curl localhost:9200/company_52fb7b90c8318c4dc86b/_search -d'{ fields: [], query: { filtered: { query: { match: { _all: i do not } } } }, highlight: { fields: { metadatas.*: { number_of_fragments : 1, fragment_size : 20 } } } }' This should return snippet whose size does not exceeds 20 characters. Most of the time, this works, however i do have one document analyzed with the same mappings which yields really long snippets - in fact, it is not truncated, and contains all text. Here is a sample working as expected: {took:21,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[, and emdo/em not hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[ take his child.\nemI/em emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[ resident of DC, emI/em am]}}]}} And here is the unruly one: {took:122,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[, and emdo/em not hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[ take his child.\nemI/em emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[ resident of DC, emI/em am]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc7,_score:0.13437755,highlight:{metadatas.text:[.\nemI/em emdo/em not enlighten those who are not eager to learn, nor arouse\nthose who are not anxious to give an explanation themselves. If emI/em\nhave presented one corner of the square and they cannot come\nback to me with the other three, emI/em should not go over the points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries to be an introduction to the basic\nprinciples of programming. Programming, it turns out, is hard. The\nfundamental rules are, most of the time, simple and clear. But programs,\nwhile built on top of these basic rules, tend to become complex enough to\nintroduce their own rules, their own complexity. Because of this, programming\nis rarely simple or predictable. As Donald Knuth, who is something of a\nfounding father of the field, says, it is an art.\nTo get something out of this book, more than just passive reading is required.\nTry to stay sharp, make an effort to solve the exercises, and only continue on\nwhen you are reasonably sure you understand the material that came before.\nThe computer programmer is a creator of universes for which he\nalone is responsible. Universes of virtually unlimited complexity can\nbe created in the form of computer programs.\n― Joseph Weizenbaum, Computer Power and Human Reason\nA program is many things. It is a piece of text typed by a programmer, it is\nthe directing force that makes the computer emdo/em what it does, it is data in the\ncomputer's memory, yet it controls the actions performed on this same\nmemory. Analogies that try to compare programs to objects we are familiar\nwith tend to fall short, but a superficially fitting one is that of a machine. The\ngears of a mechanical watch fit together ingeniously, and if the watchmaker\nwas any good, it will accurately show the time for many years. The elements\nof a program fit together in a similar way, and if the programmer knows what\nhe is doing, the program will run without crashing.\nA computer is a machine built to act as a host for these immaterial machines.\nComputers themselves can only emdo/em stupidly straightforward things. The reason\nthey are so useful is that they emdo/em these things at an incredibly high speed. A\nprogram can, by ingeniously combining many of these simple actions, emdo/em very\ncomplicated things.\nTo some of us, writing computer programs is a fascinating game. A program\nis a building of thought. It is costless to build, weightless, growing easily under\nour typing hands. If we get carried away, its size and complexity will grow out\nof control, confusing even the one who created it. This is the main problem of\nprogramming. It is why so much of today's software tends to crash, fail,\nscrew up.\nWhen a program works, it is beautiful. The art of programming is the skill of\ncontrolling complexity. The great program is subdued, made simple in its\ncomplexity.\nToday, many programmers believe
Re: long GC pauses but only one 1 host in the cluster
Is this node showing more activity than others? What kind of workload is this, indexing/search? Are caches used, for filter/facets? Full GC runs caused by CMS Old Gen may be a sign that you are close at memory limits and need to add nodes, but it could also mean a lot of other different things. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEHSS9tmb26PPcm5uB4QNurczaXvo8iRXkj9APCFUuBHQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Problem with keeping in sync Elasticsearch across two data centers
I will try the tribe node feature, even if I don't understand it completely... but I think it deserves some experimentation Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto: Thanks so much everyone for sharing your thoughts! -Amit. On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu hari...@gmail.comjavascript: wrote: I think with current ES version you have 3 options. - Use the great snapshot and restore feature to snapshot from a DC and restore in the other one - Index in both DC (so two distinct clusters) from a client level - Use Tribe node feature to search or index on multiple clusters Reference post https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote: Hi Amit, Ivan is correct. You might also check out I believe that you're looking for TribeNodes http://www.elasticsearch.org/guide/en/ elasticsearch/reference/master/modules-tribe.html and see if it fits your needs for cross-dc replication. --Mike On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote: Hello Michael - Understand that ES is not built to maintain consistent cluster state across data centers. what I am wondering is whether there is a way for ElasticSearch to continue to replicate data onto a different data center (with some delay of course) so that when the primary center fails, the fail over data center still has most of the data (may be except for the last few seconds/minutes/hours). Overall I am looking for a right way to implement cross data center deployment of elastic-search! -Amit. On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick michae...@serenesoftware.com wrote: Dario, I believe that you're looking for TribeNodes http://www. elasticsearch.org/guide/en/elasticsearch/reference/ master/modules-tribe.html ES is not built to consistently cluster across DC's / larger network lags. On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote: Hi, I've the following problem: our application publishes content to an Elasticsearch cluster. We use local data less node for querying elasticsearch then, so we don't use HTTP REST and the local nodes are the loadbalancer. Now they came with the requirement of having the cluster replicated to another data center too (and in the future maybe another too... ) for resilience. At the very beginning we thought of having one large cluster that goes across data centers (crazy). This solution has the following problems: - The cluster has the split-brain problem (!) - The client data less node will try to do requests across different data centers (is there a solution to this???). I can't find a way to avoid this. We don't want this to happen because of a) latency and b) firewalling issues. So we started to think that this solution is not really viable. So we thought of having one cluster per data center, which seems more sensible. But then here we have the problem that we must publish data to all clusters and, if one fails, we have no means of rolling back (unless we try to set up a complicated version based rollback system). I find this very complicated and hard to maintain, although can be somewhat doable. My biggest problem is that we have to keep the data centers in the same state at any time, so that if one goes down, we can readily switch to the other. Any ideas, or can you recommend some support to help use deal with this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d% 40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR% 2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7- uosp9SWCCeZmkRdQHsSJTSndA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received
Expanding terms
Hello, I'trying to find a way how to: 1. expand a term - get all words and count that are relevant for a term(s) 2. get relevant words for a query - list of all words that are highlighted 3. get phrases by word - e.g. for word war = world war, second word war, the second word war, and complicated one 1. is there a way how to get/highlighted only words that are relevant for multiple term conditions e.g. must: { wildcard: { content_morfo: { value: v* } }, wildcard: { content_morfo: { value: ==AA*== } } } thx Petr -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f2c0eeb5-8da3-4554-bdac-d6b5122f01c1%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Is consistent scoring across 2 documents that match either 1 of 2 properties possible?
Hi, We've been struggling with this for a few days now so I think it is time to pick the expert brains, although probably best to explain by delving straight into an example: 1) Assuming we have the following document: { id: /people/person1, dob: 1980-04-12, fullname: Mickey Arthur Mouse, aliasfullname: Mickey Bernard Mouse } 2) When we do the this search: { query : { bool : { should : [ { match : { fullname : { query : mickey arthur mouse } } }, { match : { aliasfullname : { query : mickey arthur mouse } } } ] } } } we get score 13.37 (for example) 1) Now, assuming we have this document (same as above except no aliasfullname) { id: /people/person1, dob: 1980-04-12, fullname: Mickey Arthur Mouse } 2) When we do the search from step 2 above we get score 3.76 (for example) How can we ensure that if a search is done on either the real name or the alias name (we won't know which is being searched on) that a person with an alias does not get scored higher than someone without an alias? What type of query could we use to ensure that both searches return the same score? We've tried dis_max and have omit_norms: true on the searched fields but nothing gives the same score so I am beginning to wonder if it is an unrealistic expectation. Any assistance/advice would be greatly appreciated. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aff6faac-3ead-4a69-ab70-12d2ccd3bc59%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: ES doesn't take into account field level boost in prefix query over catch-all field?
Hi All. I have the same issue and would highly appreciate answer. Many Thanks! Maxim -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/31a85891-ab76-402a-83a0-bbf442defc00%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Is there a difference between indexing envelopes or polygons.
Hey, the problem here is that elasticsearch cant tell by itself if the envelope borders need to be reverted or not... maybe you want/need such an envelope in your calculations. Hard to tell from a machine perspective :-) --Alex On Tue, Feb 25, 2014 at 10:38 AM, Nicolas THOMASSON nico.thomas...@gmail.com wrote: Hey thanks a lot ! Now it works just fine. I didn't see that coming, I thought ES was complaining if envelope's coordinates was reverted. My bad... Nicolas Le lundi 24 février 2014 15:58:24 UTC+1, Alexander Reelsen a écrit : Hey, if there is an error, can you please open a github issue? However the envelope shape expects you to set an upper left and lower right boundary. Your coordinates more look like lower left and upper right (meaning you might create quite a huge envelope acutally) - which obviously does not matter for a polygon --Alex On Sat, Feb 22, 2014 at 11:14 AM, Nicolas THOMASSON nico.th...@gmail.com wrote: Hello, I'm new to ES. Please forgive me if I'm asking something stupid. Is there a fundamental difference between indexing an envelope or indexing a polygon ? For example if I define the area as a envelope { frame:{ type:envelope, coordinates: [[3,4],[1,2]] } } or as a polygon { frame:{ type:polygon, coordinates: [[[3,4],[3,2],[1,2],[1,4],[3,4]]] } } As in my comprehension they both define the same area, should I be able to perform the same queries whatever the way I defined the area ? (Currently I have a search query that returns wrong results on the envelope and seems to perform well on the polygon.) Thanks for your help, Nicolas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/a2fa0fd8-f9a9-435b-9d34-e603c7242d2f% 40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01f3c770-60b3-494c-bac5-bba5a1ef673e%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8_oP8dVY4%3DBur2eWxbiJDXUx9LxpjXnBqpRksk1yQnkQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [Hadoop] Any goos tut to start with ?
Hi Costin, I did not see the video. It's a good starting point. I 'm not a big fan of videos though. I might reproduce it using Hortonworks sandbox. Cordialement, Yann Barraud 2014-02-24 13:35 GMT+01:00 Costin Leau costin.l...@gmail.com: Have you looked at the video? It does exactly that. Is there something missing? On 2/24/2014 12:41 PM, Yann Barraud wrote: Hi Costin, What I'd love to see is a step by step tut have ES and Haddop working together. Is there somewhere I can have something like this ? Regards, Yann Le jeudi 20 février 2014 16:25:28 UTC+1, John Pauley a écrit : Any more tutorials, say append to list? On Wednesday, February 19, 2014 12:54:15 PM UTC-5, Costin Leau wrote: Hi, We tried to make the docs friendly in this regard - each section (from Map/Reduce to Pig) has several examples. There's also a short video which guides you through the various features (with code) available here [1]. Hope this helps, [1] http://www.elasticsearch.org/videos/search-and-analytics- with-hadoop-and-elasticsearch/ http://www.elasticsearch.org/videos/search-and-analytics- with-hadoop-and-elasticsearch/ On 19/02/2014 5:11 PM, Yann Barraud wrote: Hi everyone, Do you have a good pointer to a tut to start playing with ES Hadoop ? Using Hortonworks VM for example ? Thanks. Cheers, Yann -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7b35ba0- 2b42-4270-bb64-228dad7fc426%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a7b35ba0- 2b42-4270-bb64-228dad7fc426%40googlegroups.com. For more options, visithttps://groups.google.com/groups/opt_out https://groups.google.com/groups/opt_out. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c55aa6e1- adde-4044-baee-c80516fe00e6%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- Costin -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/elasticsearch/fUX6tYNRu1k/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/530B3CA2.6010001%40gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BhvuXftgMHOUciMUawAK5%2BbWEiuq4-aczdUL_umP8%3D6sc_Hqg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Problem with keeping in sync Elasticsearch across two data centers
From the docs it is not clear if having two clusters with the same indexes, a indexing operation will have effect on both... There is a line that leaves me bit doubtful: However, there are a few exceptions: - The merged view cannot handle indices with the same name in multiple clusters. It will pick one of them and discard the other. Il giorno martedì 25 febbraio 2014 10:04:05 UTC, Dario Rossi ha scritto: I will try the tribe node feature, even if I don't understand it completely... but I think it deserves some experimentation Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto: Thanks so much everyone for sharing your thoughts! -Amit. On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu hari...@gmail.comwrote: I think with current ES version you have 3 options. - Use the great snapshot and restore feature to snapshot from a DC and restore in the other one - Index in both DC (so two distinct clusters) from a client level - Use Tribe node feature to search or index on multiple clusters Reference post https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote: Hi Amit, Ivan is correct. You might also check out I believe that you're looking for TribeNodes http://www.elasticsearch.org/guide/en/ elasticsearch/reference/master/modules-tribe.html and see if it fits your needs for cross-dc replication. --Mike On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote: Hello Michael - Understand that ES is not built to maintain consistent cluster state across data centers. what I am wondering is whether there is a way for ElasticSearch to continue to replicate data onto a different data center (with some delay of course) so that when the primary center fails, the fail over data center still has most of the data (may be except for the last few seconds/minutes/hours). Overall I am looking for a right way to implement cross data center deployment of elastic-search! -Amit. On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick michae...@serenesoftware.com wrote: Dario, I believe that you're looking for TribeNodes http://www. elasticsearch.org/guide/en/elasticsearch/reference/ master/modules-tribe.html ES is not built to consistently cluster across DC's / larger network lags. On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote: Hi, I've the following problem: our application publishes content to an Elasticsearch cluster. We use local data less node for querying elasticsearch then, so we don't use HTTP REST and the local nodes are the loadbalancer. Now they came with the requirement of having the cluster replicated to another data center too (and in the future maybe another too... ) for resilience. At the very beginning we thought of having one large cluster that goes across data centers (crazy). This solution has the following problems: - The cluster has the split-brain problem (!) - The client data less node will try to do requests across different data centers (is there a solution to this???). I can't find a way to avoid this. We don't want this to happen because of a) latency and b) firewalling issues. So we started to think that this solution is not really viable. So we thought of having one cluster per data center, which seems more sensible. But then here we have the problem that we must publish data to all clusters and, if one fails, we have no means of rolling back (unless we try to set up a complicated version based rollback system). I find this very complicated and hard to maintain, although can be somewhat doable. My biggest problem is that we have to keep the data centers in the same state at any time, so that if one goes down, we can readily switch to the other. Any ideas, or can you recommend some support to help use deal with this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5424a274- 3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ CAP8axnDW4GCDnnzwA%2BcyR%2BN4g-26VV4CZ-ZW6SDGgxFL75qy% 2Bw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the
dumping index is slow as hell
Hey guys, I needed to migrate an index to a new cluster and after a lot of hesitating I decided to give it a try to taskrabbit's elasticsearch-dump: https://github.com/taskrabbit/elasticsearch-dump I tested it with 10k documents, which worked fine, so I decided to migrate the real data to the new cluster with the following command: elasticdump --input=http://oldcluster:9200/my_index \ --output=http://newcluster:9200/my_index my_index contains ~5 million documents, so I expected it to take a while, but not *this* long. It's been running since 10 AM UTC+1 yesterday and it's migrated only a bit over 1.5 million docs so far - in roughly 28 hours. When it started, it indexed around 100 docs per second, by the time I went home from work (around 5 PM UTC+1), it was only around 30 docs/s, now it's around 10 docs/s. Being a newbie with ElasticSearch, I don't even know how to diagnose what is the reason of this slowness. Could you help me with this? Keep in mind that I'm at work for 2 or 3 more hours today, but after that, I won't have access to the servers until next Monday. Feel free to suggest anything in that time too, I will read it and try to reply, but can't look into anything or do anything about it. Regards, Attila Bukor -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/lei4qh%24370%241%40ger.gmane.org. For more options, visit https://groups.google.com/groups/opt_out.
Re: Aggregation on parent/child documents
We run 4 instances of ES 1.0.0 using 30G for JVM. We run 64-bit OpenJDK 1.7.0_25 on ubuntu servers. $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 515139 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 64000 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 515139 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited And I also disabled swap on linux. You can use this gist to simulate the issue we have: https://gist.github.com/chaos-generator/9143655 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6db68fc-a7c8-43af-bbc4-59a0866aba36%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: fragment_size not used for simple queries
It would be useful if you can post a complete recreation, mappings included. Which highlighter are you using? On Tuesday, February 25, 2014 10:39:10 AM UTC+1, Neamar Tucote wrote: Hello, Using the highlight API for a simple query like this: curl localhost:9200/company_52fb7b90c8318c4dc86b/_search -d'{ fields: [], query: { filtered: { query: { match: { _all: i do not } } } }, highlight: { fields: { metadatas.*: { number_of_fragments : 1, fragment_size : 20 } } } }' This should return snippet whose size does not exceeds 20 characters. Most of the time, this works, however i do have one document analyzed with the same mappings which yields really long snippets - in fact, it is not truncated, and contains all text. Here is a sample working as expected: {took:21,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[, and emdo/em not hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[ take his child.\nemI/em emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[ resident of DC, emI/em am]}}]}} And here is the unruly one: {took:122,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[, and emdo/em not hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[ take his child.\nemI/em emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[ resident of DC, emI/em am]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc7,_score:0.13437755,highlight:{metadatas.text:[.\nemI/em emdo/em not enlighten those who are not eager to learn, nor arouse\nthose who are not anxious to give an explanation themselves. If emI/em\nhave presented one corner of the square and they cannot come\nback to me with the other three, emI/em should not go over the points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries to be an introduction to the basic\nprinciples of programming. Programming, it turns out, is hard. The\nfundamental rules are, most of the time, simple and clear. But programs,\nwhile built on top of these basic rules, tend to become complex enough to\nintroduce their own rules, their own complexity. Because of this, programming\nis rarely simple or predictable. As Donald Knuth, who is something of a\nfounding father of the field, says, it is an art.\nTo get something out of this book, more than just passive reading is required.\nTry to stay sharp, make an effort to solve the exercises, and only continue on\nwhen you are reasonably sure you understand the material that came before.\nThe computer programmer is a creator of universes for which he\nalone is responsible. Universes of virtually unlimited complexity can\nbe created in the form of computer programs.\n― Joseph Weizenbaum, Computer Power and Human Reason\nA program is many things. It is a piece of text typed by a programmer, it is\nthe directing force that makes the computer emdo/em what it does, it is data in the\ncomputer's memory, yet it controls the actions performed on this same\nmemory. Analogies that try to compare programs to objects we are familiar\nwith tend to fall short, but a superficially fitting one is that of a machine. The\ngears of a mechanical watch fit together ingeniously, and if the watchmaker\nwas any good, it will accurately show the time for many years. The elements\nof a program fit together in a similar way, and if the programmer knows what\nhe is doing, the program will run without crashing.\nA computer is a machine built to act as a host for these immaterial machines.\nComputers themselves can only emdo/em stupidly straightforward things. The reason\nthey are so useful is that they emdo/em these things at an incredibly high speed. A\nprogram can, by ingeniously combining many of these simple actions, emdo/em very\ncomplicated things.\nTo some of us, writing computer programs is a fascinating game. A program\nis a building of thought. It is costless to build, weightless, growing easily under\nour typing hands. If we get carried away, its size and complexity will grow out\nof control, confusing even the one who created it. This is the main problem of\nprogramming. It
How can I do date-calculation/conversion in an MVEL script in ES 1.0.0?
Hello, I'm considering upgrading from 0.90.3 to 1.0.0, but I've hit a snag with one of the MVEL scripts I use to update documents through the update api. My update-script uses Joda to parse/format/manipulate dates, but it appears that Joda is no longer available to MVEL scripts in version 1.0.0. (I think it changed in commit c7f6c52 from november 24th, so it's been like that for a while) Here are some code-snippets of how it currently works parser = Joda.forPattern('dateOptionalTime').parser(); lastdate = parser.parseMillis(update.date); prevdate = parser.parseMillis(ctx._source.published); timediff = lastdate-prevdate; ... nextdate=parser.parseMutableDateTime(update.date); nextdate.addHours(calculated_hours); ctx._source.nextupdate = nextdate.toString('-MM-dd\'T\'HH:mm:ss\'Z\''); Is there some way to do similar date/time calculations in ES 1.0.0? I've considered I could use a native script to do these updates; however, when I wrote the update-script I tested this, and to my surprise using a native script proved to be significantly slower than using the MVEL script for this use-case. Any help would be much appreciated, -- Harmen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a263df38-557e-4221-9b71-19ba78737edf%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[new project using es] Elasticboard - tracking github data
Hello again, Using the recently released github river[1], I'm working on an open source dashboard for keeping track of github projects. It's in the working protoype state right now and I'm trying to figure out what kind of information is desired and relevant. The idea is that people/orgs who want to use this will self deploy their own instance, but in order to show what the project is about, I set up a hosted demo. There's a landing page here[2] and the demo getting data for the elasticsearch repo here[3]. I'd love some feedback! [1] https://groups.google.com/forum/#!searchin/elasticsearch/github$20river/elasticsearch/Oy57lUSn_aY/6w6uBgNcq_MJ [2] http://elasticboard.mihneadb.net/landing.html [3] http://elasticboard.mihneadb.net/#/elasticsearch/elasticsearch Thanks, Mihnea -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c4bbe8a9-78ed-48ad-8e22-04b76c2b8af0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: the document payload of the Delete api
Unfortunately, you'll have to GET it first. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49c78498-d1b6-44ce-bb85-b6f9d9d9b7a7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: DateRange aggregation semantics - include_lower/include_upper?
Yes, you are correct. The from is inclusive, and the to is exclusive. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01116905-849f-44a4-aa61-17ec6149ba00%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Elasticsearch 1.0.0 is now GA
In principle, I agree with everything you describe about best practice but those practices become more important only when you're managing larger numbers of nodes. For those who manage only 5 nodes, the balance may swing in favor of just edit each machine's config directly instead of a more centralized strategy. It's a cost/benefit of which approach requires more work. As far as re-making configs with every version change, from what I've seen so far I don't think that is the intention of Elasticsearch (currently). The configs I see in elasticsearch.yml are largely consistent across major and minor versions... although there are exceptions. But, the current scenario doesn't even change versions... Much. The scenario is a reasonable and common reaction when repairing a package-based installation. SOP is after attempting a package update/upgrade (fails) and then a forced update (forced re-install in place), the normal last attempt is a manual removal and re-installation (which is required when upgrading from RC to GA). And, then the config file is removed. That said, with my latest RC GA upgrades, I noticed the new workaround the Packager is now doing. Although the config file is still being wiped out, a backup of the config file is being created. So, although a bit unusual it works for me and should prevent the worst complaints in the future. Feature Request: Improving the current packaging practice of creating a config backup, it would also be nice if the old config file can be parsed for uncommented commands and merged into the new config file. Tony On Monday, February 24, 2014 12:43:04 PM UTC-8, InquiringMind wrote: I am not sure what the complaints are all about. Over the past 20 years, my best practices are to treat the installed configurations as a template that is subject to change upon reinstallation. Then, I always create my own configuration and point the server to it, and never point a server to the package's installed configuration. And then, I maintain all of my customized configurations separately from the installed packages. Pointing to the installed configuration that you've modified is really no different that running the installed jars that you have modified.Would you really expect a reinstallation of Elasticsearch to preserve the changes you have made to the originally installed elasticsearch-1.0.0.jar file? The beauty of Elasticsearch's configurations are that they document everything but actually set nothing. That's even better than the configurations for the servers I write in which I set everything but to the default values in the code. Same end result; different means of getting there. In fact, the installed config is a big part of the package's documentation about what is available to be configured. So I would expected it to change on each installation. And for the turn-key servers I developed in the past where the configs were not maintained by Puppet or Chef or some other automated tool, I would write a post installation step that would copy the installed config over a taret config, but only if that target config did not exist. That way, the customer could modify the target config and their changes would be preserved. But today, our elasticsearch.yml file and other server configs are maintained by Puppet and because we don't touch the installed config we never have any problems with overwriting on a reinstallation. Brian On Monday, February 17, 2014 5:14:46 PM UTC-5, Tony Su wrote: What?! Removing and re-installing the ES package either removes the original or over-writes the existing elasticsearch.yml The is contrary to conventional packaging from what I've generally seen. Typically, when a package is removed, the configuration fie is left alone and must be removed manually if desired No big deal in my case, I've been working on elasticsearch.yml heavily for several days so can remember all the customizations I've made, but IMO this is a disaster waiting to happen for clusters with new Admins or those who attempt to fix a problem by removing and re-installing. Leaving the config file alone and re-using is the safer option. IMO, Tony -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d7d6de3-799d-4812-884a-698aff9d6121%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Elasticsearch 1.0.0 is now GA
One other issue. I have never been able to deploy an elasticsearch.yml which names the cluster node the same as the machine hostname despite the suggestions in another thread. It just won't work, and based on another thread I strongly suspect the underlying Java code implements single quotes instead of double quotes when evaluating the variable. So, because it's a unique variable that needs to be set on each machine, that part of the config won't allow simply pointing all nodes to the same config script. Is why, short of looking for the error in the Java code I've been looking at various simple and more enterprise tools that write individual config files to each node. Tony -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/958a2321-91c6-45a7-920b-fb6b3b08621c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11
This is a known issue and will be fixed shortly. For now, what you can do is run _optimize on all your indexes and set max_num_segments to 1, like below. Note that this may take a while depending on the size of your indexes. http://localhost:9200/_optimize?max_num_segments -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6d794291-eca6-46cb-93e8-d45a513990d3%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Relation Between Heap Size and Total Data Size
Hi, I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 is running all of them. Heap size is 6 GB for all the instances, so total heap size is 24 GB. I have 5 shard for each index and each shard has 1 replica. A new index is created for every day, so all indices have nearly same size. When total data size reaches around 100 GB (replicas are included), my cluster begins to fail to allocate some of the shards (status yellow). After I delete some old indices and restart all the nodes, everything is fine (status is green). If I do not delete some data, status eventually turns red. So, I am wondering that is there any relationship between heap size and total data size? Is there any formula to determine heap size based on data size? Thanks, Umutcan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Relation Between Heap Size and Total Data Size
Probably low on disc on at least one machine. Monitor disc usage. Also look in the logs and find out what error you are getting. Report back. Sent from my iPhone On Feb 25, 2014, at 7:25 AM, Umutcan umut...@gamegos.com wrote: Hi, I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 is running all of them. Heap size is 6 GB for all the instances, so total heap size is 24 GB. I have 5 shard for each index and each shard has 1 replica. A new index is created for every day, so all indices have nearly same size. When total data size reaches around 100 GB (replicas are included), my cluster begins to fail to allocate some of the shards (status yellow). After I delete some old indices and restart all the nodes, everything is fine (status is green). If I do not delete some data, status eventually turns red. So, I am wondering that is there any relationship between heap size and total data size? Is there any formula to determine heap size based on data size? Thanks, Umutcan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/A319EF53-D5C4-485C-B320-574C677D8314%40gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11
I forgot to say that one consequence is that the 'head' plugin interface remain empty. The following request timeout : * _status * stats?all=true * _nodes How to have some information on the cluster in this conditions ? Benoît -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1511903-f9c8-4332-8451-ed10aaa0fcad%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Elasticsearch 1.0.0 is now GA
I always start Elasticsearch from within my own wrapper script, es.sh. Inside this wrapper script is the following incantation: NODE_OPT=-D*es.node.name*=$(uname -n | cut -d'.' -f1) This is verified to work on Linux, Mac OS X, and Solaris (at least). I then pass $NODE_OPT as a command-line argument to the elasticsearchstart-up script. BTW, I seem to recall reading that the *es.* prefix on the node.namevariable is no longer needed for 1.0 GA. But it still works fine, so I have left it there. This has always worked since ES 0.19.4 (the very first version I installed and started using). I worked closely with our deployment engineer, and we settled on a set of wrapper scripts that let me start everything on my laptop in exactly the same way that it all starts on a production server. Brian On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote: One other issue. I have never been able to deploy an elasticsearch.yml which names the cluster node the same as the machine hostname despite the suggestions in another thread. It just won't work, and based on another thread I strongly suspect the underlying Java code implements single quotes instead of double quotes when evaluating the variable. So, because it's a unique variable that needs to be set on each machine, that part of the config won't allow simply pointing all nodes to the same config script. Is why, short of looking for the error in the Java code I've been looking at various simple and more enterprise tools that write individual config files to each node. Tony -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5af309d0-22b5-4809-907d-92b099b36632%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: the document payload of the Delete api
And note that if you GET it and save the version number, and then pass the version number into the DELETE, you can be sure it will be deleted only if nobody else updated it in the meantime. This all works so much better in Java than in scripts + curl. Brian On Tuesday, February 25, 2014 9:35:37 AM UTC-5, Binh Ly wrote: Unfortunately, you'll have to GET it first. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f3e7e97a-284b-4d3a-b333-b03b8c9f65fe%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Default analyzer when the given analyzer not found?
Based on posts to this newsgroup early on in my usage of ES (over a year now!), I used to put the following in my elasticsearch.yml file. Any field that was not explicitly assigned an analyzer and that was deemed by ES to be a string would pick up English snowball analyzer with no stop words (my preference at the time): index: analysis: analyzer: # set stemming analyzer with no stop words as the default default: type: snowball language: English stopwords: _none_ filter: stopWordsFilter: type: stop stopwords: _none_ But since then, I've long abandoned this default approach. Instead, I explicitly assigned an analyzer to each and every field (you know, like a real database!). And then my elasticsearch.yml file now contains the following: # Do not automatically create an index when a document is loaded, and do # not automatically index unknown (unmapped) fields: action.auto_create_index: false index.mapper.dynamic: false Therefore, I cannot automatically create an index during a load (which would then create a useless index without any of the analyzers and mappings I've carefully crafted). And I cannot get ES to automatically create a new field; this is very helpful when someone uses a low-level tool such as curl, and misspells a field name; ES will no longer create, for example, the givveName field when it should have been givenName. Brian On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote: Hey there. Nearly one year after this initial post, I'm running into the exact same issue, even though ES is now released (1.0). Has anybody found a proper solution within ES? I've spent like 1 hour searching for this, without any luck. The only ugly workaround that I can think of right now is deal with a fall back language at the data level i.e. before sending documents to be indexed by ES. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2f1dbdc3-299a-46fa-855f-a34c74497c43%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Default analyzer when the given analyzer not found?
Ah yes, via the default in the yaml configuration file, of course. I'll give that a try, thanks! It is a pity though that the default analyzer doesn't seem to do his job of processing all unmatched document as far as the _analyze field is concerned. Thanks Fred P.S. : I do understand your position about not indexing documents for which you haven't craft a dedicated analyzer yet. Makes real sense. On Tuesday, February 25, 2014 5:09:43 PM UTC+1, InquiringMind wrote: Based on posts to this newsgroup early on in my usage of ES (over a year now!), I used to put the following in my elasticsearch.yml file. Any field that was not explicitly assigned an analyzer and that was deemed by ES to be a string would pick up English snowball analyzer with no stop words (my preference at the time): index: analysis: analyzer: # set stemming analyzer with no stop words as the default default: type: snowball language: English stopwords: _none_ filter: stopWordsFilter: type: stop stopwords: _none_ But since then, I've long abandoned this default approach. Instead, I explicitly assigned an analyzer to each and every field (you know, like a real database!). And then my elasticsearch.yml file now contains the following: # Do not automatically create an index when a document is loaded, and do # not automatically index unknown (unmapped) fields: action.auto_create_index: false index.mapper.dynamic: false Therefore, I cannot automatically create an index during a load (which would then create a useless index without any of the analyzers and mappings I've carefully crafted). And I cannot get ES to automatically create a new field; this is very helpful when someone uses a low-level tool such as curl, and misspells a field name; ES will no longer create, for example, the givveName field when it should have been givenName. Brian On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote: Hey there. Nearly one year after this initial post, I'm running into the exact same issue, even though ES is now released (1.0). Has anybody found a proper solution within ES? I've spent like 1 hour searching for this, without any luck. The only ugly workaround that I can think of right now is deal with a fall back language at the data level i.e. before sending documents to be indexed by ES. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0fdb30b-d63a-4679-899a-36b45c788d8d%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11
Thank you Binh Ly, On Tuesday, February 25, 2014 4:25:59 PM UTC+1, Binh Ly wrote: This is a known issue and will be fixed shortly. For now, what you can do is run _optimize on all your indexes and set max_num_segments to 1, like below. Note that this may take a while depending on the size of your indexes. http://localhost:9200/_optimize?max_num_segments Your suggestion confirm what Jörg Prante said here https://groups.google.com/d/msg/elasticsearch/7mrDhqe6LEo/3gjOJka85OYJ This is a problem with Lucene segment of version 3.x I have around 1T of index, so I'm not really happy to run optimize, I will try on one of the smallest index. If I stop all the request to the statistics API, I should see the load decreasing ? Regards. Benoît -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/99b02139-3d02-434d-a3e4-724b876c3a27%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [Book] Mastering ElasticSearch Review
I purchased the book when Packt was having a $5 ebook sale a couple of months ago. Did not really need the book, but it was cheap and I wanted to support the author who has posted on the mailing list in the past. Overall a decent book, recommended for anyone getting started with Elasticsearch. My main complaint was that book went through each configuration parameter in detail, resulting in a lot of bloat. Some might consider such an approach a good thing. Ivan On Mon, Feb 24, 2014 at 6:55 PM, Nick Wood nwood...@gmail.com wrote: I read Elasticsearch Server several months ago and found it helpful. But I'm hesitant to get any more books that aren't focused on 1.x - hopefully we'll see some pop up soon (nudge nudge). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f02e712-b071-4cd8-9bfd-1585a1c6e4ca%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCsHQNHRX1QwAYBk4r_bTSYnsg8xcwCmeDtiq96st2Oqw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: dumping index is slow as hell
Have you benchmarked your cluster? How many docs can you index per second with bulk indexing? Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%3DEDqwhKC%3DgZXOwvdf%2B6FJ%3DBOrLmNFpSuraX2-JcTbYA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat
Not sure, but maybe you have jars with ES classes in the plugins folder that went astray? IIRC I saw these kind of errors and it was a plugin with dependencies that were not compatible. If that is your code you can hack on, last resort is printing the current classpath in the log file... Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHA5QW2C4R_XQNc26PcxAuGgB_%2BR7ii3f_r3iEmeim88g%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Compute TF/IDF across indexes
Hi, I'm trying to search across multiple indexes and I couldn't understand the result of the TF/TDF function. I didn't expect for the indexes where the term is more frequent to get penalized. Here follows an example: https://gist.github.com/luizgpsantos/9216108 When searching for the term alice the document {_index: index2, _type: type, _id: 1} got a score 0.8784157 while {_index: index1, _type: type, _id: 1} got a score 0.4451987. In my use case I got one index about sports and another about celebrities and when I search for a celebrity documents across sports and celebrities indexes, results from sports index tend to appear in first place due to the explanation above (we have few celebrities documents in sports index). But the point is that when searching for a celebrity I would expect results from the celebrity index. Is there any way to calculate the score not penalizing indexes where the frequency of a term is higher? Cheers, -- Luiz Guilherme P. Santos -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Compute TF/IDF across indexes
I have never tried or looked at the code, but off the top of my head perhaps the DFS query type would work: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch Since the DFS query type calculates the TF/IDF values based on the values in each individual shard, perhaps it ignores which index the shard belongs to. Easy to test. If not, the solution might be tricky. You can eliminate term length normalization, but your issue is with the IDF. You can create your own Similarity, but the best you can do is ignore the IDF, which probably would not be ideal. Ultimately, you can try script based scoring. The TF/IDF values are exposed to the scripts, so you can try to apply some type of normalization yourself. Kludgy and it would impact performance. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html Hopefully DFS queries would work or someone else has a better idea! Cheers, Ivan On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos luizgpsan...@gmail.com wrote: Hi, I'm trying to search across multiple indexes and I couldn't understand the result of the TF/TDF function. I didn't expect for the indexes where the term is more frequent to get penalized. Here follows an example: https://gist.github.com/luizgpsantos/9216108 When searching for the term alice the document {_index: index2, _type: type, _id: 1} got a score 0.8784157 while {_index: index1, _type: type, _id: 1} got a score 0.4451987. In my use case I got one index about sports and another about celebrities and when I search for a celebrity documents across sports and celebrities indexes, results from sports index tend to appear in first place due to the explanation above (we have few celebrities documents in sports index). But the point is that when searching for a celebrity I would expect results from the celebrity index. Is there any way to calculate the score not penalizing indexes where the frequency of a term is higher? Cheers, -- Luiz Guilherme P. Santos -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Removing elasticsearch logs
There is currently discussion around this: https://github.com/elasticsearch/elasticsearch-marvel/issues/95 But in the meantime, try this to see if it helps: https://github.com/elasticsearch/curator -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d4e55016-a101-403f-b93d-74bd976eadc1%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?
Dear Hariharan, Alex, Luke, My apologies. You're quite right. The information is there -- I just didn't read far enough down. Thank you for your help persistence. Best regards, - Daniel -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEmLStnHQCUuMPJHhbcoq8_iQgFX%3D22t9%3DS9gOwWC7C1OtDToA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: ES 1.0.0 Source filtering using the Java API
Thanks for your response.I can't see the method 'setFetchSource' in the Client class. Are you sure that is in 1.0.0? On Tuesday, February 25, 2014 8:41:37 PM UTC, Binh Ly wrote: Yes you can use the client.setFetchSource() method: SearchResponse response = client.prepareSearch(index) .setFetchSource(new String[] {field1, field2}, null) .execute() .actionGet(); -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d9b1aa0b-80f8-49a8-9f50-5e54db05562c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Sorting date fields
Hi all, I have a question on how sorting during queries works in elasticsearch. I have an index with a custom date format field, on which the sort is applied. When quering the index for a given keywork, results are provided with the given sort. However, I've observed that some documents are not present in the result set. I would have expected these results to be part of the result set as it would be in relational systems using the SQL ORDER BY statement. I've verified that these missing documents are covered by the query using the explain api. According to the documentation, score computation ist not performed when using sorts on fields. Maybe someone can provide more information on how sorting is done? I am using Elasticsearch 1.0.0RC1 on debian whezzy with openjdk7-jdk. Thanks, Adrian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140225212301.GA22436%40server1.850nm.net. For more options, visit https://groups.google.com/groups/opt_out.
Re: scalability and creating 1 index per user
On Tue, Feb 25, 2014 at 4:46 PM, ESUser neerav...@gmail.com wrote: Hi All, I am exploring elastic search to create one index per user instead of one big index for all the users. Each index would be about 6G. I am wondering if anyone has tried it and how would it scale? I couldn't find that elastic search has limit on maximum number of indices. Is it safe/recommended to have say 20K indices for 20K users? Would it architecture scale well? I'm running 1107 indexes right now. Some of the cluster actions are a bit slower then I'd like but I think that is better in 1.0. I don't think it'd work well an order of magnitude larger but I could be wrong. Also, if start with say a 5 nodes cluster now, and add more nodes as I need them, does ES redistributes its shards every time I add new nodes? How newly added nodes are utilized in a cluster? It'll smooth the shards out across the new nodes. There is configuration for how many concurrent moves can take place and how much bandwidth is ok per move. The defaults are a bit slow especially if you have fast network and disks. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: scalability and creating 1 index per user
20K is a lot of indexes, probably too many as ES will need to maintain state about each of those in memory which could mean you have nothing left for caching indexed data! You might want to look at http://www.elasticsearch.org/blog/customizing-your-document-routing/instead, that way you can reduce your index count but still gain the same usability outcome. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 26 February 2014 08:52, Nikolas Everett nik9...@gmail.com wrote: On Tue, Feb 25, 2014 at 4:46 PM, ESUser neerav...@gmail.com wrote: Hi All, I am exploring elastic search to create one index per user instead of one big index for all the users. Each index would be about 6G. I am wondering if anyone has tried it and how would it scale? I couldn't find that elastic search has limit on maximum number of indices. Is it safe/recommended to have say 20K indices for 20K users? Would it architecture scale well? I'm running 1107 indexes right now. Some of the cluster actions are a bit slower then I'd like but I think that is better in 1.0. I don't think it'd work well an order of magnitude larger but I could be wrong. Also, if start with say a 5 nodes cluster now, and add more nodes as I need them, does ES redistributes its shards every time I add new nodes? How newly added nodes are utilized in a cluster? It'll smooth the shards out across the new nodes. There is configuration for how many concurrent moves can take place and how much bandwidth is ok per move. The defaults are a bit slow especially if you have fast network and disks. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZbwOJarNCT8wrEzi047V8GCP3mYfh_2X7MQOFrb4QbCg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Sorting date fields
ES loads the values of the fields to sort on into memory cache. You should update to 1.0.0, maybe you hit a bug that has been fixed. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHo_N3LXo6NiGztL5%3D1GaqVG4QOTCX32OuAjYeOhGfFng%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Elasticsearch 1.0.0 is now GA
I do not use quotes at all. Simply: node.name: ${HOSTNAME} -- Ivan On Tue, Feb 25, 2014 at 7:56 AM, InquiringMind brian.from...@gmail.comwrote: I always start Elasticsearch from within my own wrapper script, es.sh. Inside this wrapper script is the following incantation: NODE_OPT=-D*es.node.name http://es.node.name*=$(uname -n | cut -d'.' -f1) This is verified to work on Linux, Mac OS X, and Solaris (at least). I then pass $NODE_OPT as a command-line argument to the elasticsearchstart-up script. BTW, I seem to recall reading that the *es.* prefix on the node.namevariable is no longer needed for 1.0 GA. But it still works fine, so I have left it there. This has always worked since ES 0.19.4 (the very first version I installed and started using). I worked closely with our deployment engineer, and we settled on a set of wrapper scripts that let me start everything on my laptop in exactly the same way that it all starts on a production server. Brian On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote: One other issue. I have never been able to deploy an elasticsearch.yml which names the cluster node the same as the machine hostname despite the suggestions in another thread. It just won't work, and based on another thread I strongly suspect the underlying Java code implements single quotes instead of double quotes when evaluating the variable. So, because it's a unique variable that needs to be set on each machine, that part of the config won't allow simply pointing all nodes to the same config script. Is why, short of looking for the error in the Java code I've been looking at various simple and more enterprise tools that write individual config files to each node. Tony -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5af309d0-22b5-4809-907d-92b099b36632%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBr%2BakzzsYR3cd3UQ2dsTDD0iFSHoS0w7cgUz_fE7HNpQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: ES 1.0.0 Source filtering using the Java API
Hmmm, can please double-check. I can see it from the tests here: https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0676082-5638-4f16-b5b7-76fb42ac2a5e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Sorting date fields
On Tue, Feb 25, 2014 at 11:11:13PM +0100, joergpra...@gmail.com wrote: Jörg, ES loads the values of the fields to sort on into memory cache. Yes, I've read that - is it known when these caches are flushed? You should update to 1.0.0, maybe you hit a bug that has been fixed. I'll do that. I am just wondering if I am missing something .. Best regards, Adrian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140225222829.GB22436%40server1.850nm.net. For more options, visit https://groups.google.com/groups/opt_out.
Re: ES 1.0.0 Source filtering using the Java API
Yes, I can see it. Thanks. On 25 Feb 2014, at 22:23, Binh Ly binhly...@yahoo.com wrote: Hmmm, can please double-check. I can see it from the tests here: https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0676082-5638-4f16-b5b7-76fb42ac2a5e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9E1DEA66-32A9-4AFC-B0EB-C5507A72B266%40gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat
Many, many, way to many, hours later it came down to what everyone was suggesting was the problem in the first place: an old elasticsearch jar sitting in an abandoned directory but still scanned by tomcat's class loader. Thanks for your help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1d2f4fba-7ae9-46ce-8ea1-d05ca53e3357%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Sorting date fields
For the cache, see http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/index-modules-fielddata.html By default, the field cache size is unbounded, and does not expire. For sort, it means that each field to sort is examined, all values of the field are loaded, so the in-memory sorting can take place. It's exactly the same what Lucene is executing. With the default settings of the field cache, sort is working alright (unless the field values will exceed the available memory) Maybe you can set up an example of your sort as a demo, so that the error can be reproduced? Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF4zOnB5TJddf3EZTgGtDqXSBzJAHpZAPo8biUE35wGQg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
copy_to objects?
Does copy_to work with objects? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec31c050-3414-4747-ba77-7e25c7418b88%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
EsRejectedExecutionException when searching date based indices.
Hello all, I’m getting failed nodes when running searches and I’m hoping someone can point me in the right direction. I have indices created per day to store messages. The pattern is pretty straight forward: the index for January 1 is messages_20140101, for January 2 is messages_20140102 and so on. Each index is created against a template that specifies 20 shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have recently upgraded to ES 1.0. When I search for all messages in a year (either using an alias or specifying “messages_2013*”), I get many failed nodes. The reason given is: “EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”). The more often I search, the fewer failed nodes I get (probably caching in ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so the document counts coming back have to be accurate. The aggregate counts will change depending on the number of node failures. We use the Java API to create a local node to index and search the documents. However, we also see the issue if we use the URL search API on port 9200. If I restrict the search for 30 days then I do not see any failures (it’s under 1000 nodes so as expected). However, it is a pretty common use case for our customers to search messages spanning an entire year. Any suggestions on how I can prevent these failures? Thank you for your help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Need help with a large cluster restart.
I have 20 ES data nodes and 10 master nodes in my cluster. I have 6 minimum master nodes for the cluster to function. I wanted to know if any one knows of a correct way to restart a large cluster. I see different results on each cluster restart. Some times, some of the shards are in Unassigned state and they are stuck in that state. Some times the shards are getting re-allocated. So far, I am always doing a Full Cluster restart. All I want to do is restart and come back to the state where it was before restart. I really appreciate any insight into this or a link to a documentation about the cluster restarts. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8a0df2c-38a6-47c0-ad7a-113306733e09%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Lost index metadata and overwriting pre-existing index files
Hi - I recently experienced some surprising elasticsearch behavior and I'd appreciate some verification on the whys behind what we saw. Basically, during a cluster restart we lost some index metadata causing those indices to not be realized and loaded from the data nodes (raw index files still existed on disk), then, before we realized that and had a chance to recover them, new incoming data caused the cluster to create new indices under the same names, completely overwriting the original, raw index data on disk (clearing out and losing a lot of data). If that's unclear or for further details, I've posted the scenario and straightforward steps to reproduce: https://github.com/dpb587/elasticsearch-lost-index. These are my core questions... 1. Is it true that index metadata (sharding size, mapping, etc) will only ever be stored on master-capable nodes? Previously, my understanding of the master was that it was primarily responsible for managing cluster state and coordinating cluster balancing, not persisting index metadata. (I'm not arguing it doesn't necessarily make sense, just that I didn't realize cluster state included the index metadata) 2. Is there documentation on elasticsearch.org which more precisely defines the responsibilities of master and data nodes? The only vague references I've come across are http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-node.html, the elasticsearch default configuration file, and various non-authoritative blog posts and Stack Overflow answers, none of which prompted me to realize data nodes would not hold their own metadata. 3. Is it true that elasticsearch (Lucene?) will overwrite existing data files without error or warning if the cluster is not aware of the index? If so, is there a way to disable that behavior to avoid accidental data loss due to misconfiguration (aside from the broad `action.auto_create_index` setting)? If not, is there anything else which would explain the behavior we saw? Thank you for your time! Danny -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9407e415-db8f-461d-b04f-027fda4f5c9c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?
Luke? :) On Tue, Feb 25, 2014 at 1:09 PM, Daniel Winterstein daniel.winterst...@gmail.com wrote: Dear Hariharan, Alex, Luke, My apologies. You're quite right. The information is there -- I just didn't read far enough down. Thank you for your help persistence. Best regards, - Daniel -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEmLStnHQCUuMPJHhbcoq8_iQgFX%3D22t9%3DS9gOwWC7C1OtDToA%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD%3Dk0htmXcEwXBBB4T%2BwqNAyA_fOz41DX5cinf3aYsQGg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Compute TF/IDF across indexes
Hi Ivan, The DFS query then fetch worked very well! Thank you! Cheers, Luiz Guilherme On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic i...@brusic.com wrote: I have never tried or looked at the code, but off the top of my head perhaps the DFS query type would work: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch Since the DFS query type calculates the TF/IDF values based on the values in each individual shard, perhaps it ignores which index the shard belongs to. Easy to test. If not, the solution might be tricky. You can eliminate term length normalization, but your issue is with the IDF. You can create your own Similarity, but the best you can do is ignore the IDF, which probably would not be ideal. Ultimately, you can try script based scoring. The TF/IDF values are exposed to the scripts, so you can try to apply some type of normalization yourself. Kludgy and it would impact performance. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html Hopefully DFS queries would work or someone else has a better idea! Cheers, Ivan On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos luizgpsan...@gmail.com wrote: Hi, I'm trying to search across multiple indexes and I couldn't understand the result of the TF/TDF function. I didn't expect for the indexes where the term is more frequent to get penalized. Here follows an example: https://gist.github.com/luizgpsantos/9216108 When searching for the term alice the document {_index: index2, _type: type, _id: 1} got a score 0.8784157 while {_index: index1, _type: type, _id: 1} got a score 0.4451987. In my use case I got one index about sports and another about celebrities and when I search for a celebrity documents across sports and celebrities indexes, results from sports index tend to appear in first place due to the explanation above (we have few celebrities documents in sports index). But the point is that when searching for a celebrity I would expect results from the celebrity index. Is there any way to calculate the score not penalizing indexes where the frequency of a term is higher? Cheers, -- Luiz Guilherme P. Santos -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- Luiz Guilherme P. Santos -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGLPTbZgwyoBARjwcg9v0sUsjuxw4m_6X1iFQqO6zTHaQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Need help with a large cluster restart.
Some of these will help - http://gibrown.wordpress.com/2013/12/05/managing-elasticsearch-cluster-restart-time/ Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 26 February 2014 11:57, Search User feedwo...@gmail.com wrote: I have 20 ES data nodes and 10 master nodes in my cluster. I have 6 minimum master nodes for the cluster to function. I wanted to know if any one knows of a correct way to restart a large cluster. I see different results on each cluster restart. Some times, some of the shards are in Unassigned state and they are stuck in that state. Some times the shards are getting re-allocated. So far, I am always doing a Full Cluster restart. All I want to do is restart and come back to the state where it was before restart. I really appreciate any insight into this or a link to a documentation about the cluster restarts. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8a0df2c-38a6-47c0-ad7a-113306733e09%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZdhaHpnkh8E1OmD5gVzfKjac2b5sd0s2tNqHi4mUVYBA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Interesting question on Transaction Log record mutability
Hi guys, If I turn off automatic indexing and refreshing, and continually execute partial updates on the same document (say 100 times), do the updates change the same record in the transaction log or will it create 100 changes? The reason I'm curious is because when I ask ES to index (or refresh) after a batch of partial updates, will it try to index the same document 100 times or just once? So efficiency seems to be important here. My data structure is a Customer with lots of Transactions with each record containing a date, description, and dollar amount. I would like to see if a denormalized data structure works here by keeping a list of transactions on the customer, then updating new transactions into the same customer record. But this would be very inefficient if the document would have to be reindexed as many times as the number of incoming partial updates. I'm hoping I can control this by turning off indexing/refreshing and let ES update the same record in the Transaction log. I understand that Lucene has immutable records, but that does not really mean that the Transaction log has to have immutability, right? Thanks for any feedback/thoughts!! Yuri -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa6ba2b6-4e41-4470-811c-08e578c8c596%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Kibana: showing a ratio
Ok, I'll check it out On Tuesday, 25 February 2014 00:17:20 UTC+2, Binh Ly wrote: Unfortunately not at the moment. But if you're up to it, you can probably easily write a custom panel that will do this for you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a38341f-3c79-43b9-89ef-684b1e82c7e3%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Compute TF/IDF across indexes
Great, I am glad that it worked. I do not use multi-index searches, so I was not sure if it would. Good to know that shards from different indices can be aggregated with DFS queries. -- Ivan On Tue, Feb 25, 2014 at 6:04 PM, Luiz Guilherme Pais dos Santos luizgpsan...@gmail.com wrote: Hi Ivan, The DFS query then fetch worked very well! Thank you! Cheers, Luiz Guilherme On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic i...@brusic.com wrote: I have never tried or looked at the code, but off the top of my head perhaps the DFS query type would work: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch Since the DFS query type calculates the TF/IDF values based on the values in each individual shard, perhaps it ignores which index the shard belongs to. Easy to test. If not, the solution might be tricky. You can eliminate term length normalization, but your issue is with the IDF. You can create your own Similarity, but the best you can do is ignore the IDF, which probably would not be ideal. Ultimately, you can try script based scoring. The TF/IDF values are exposed to the scripts, so you can try to apply some type of normalization yourself. Kludgy and it would impact performance. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html Hopefully DFS queries would work or someone else has a better idea! Cheers, Ivan On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos luizgpsan...@gmail.com wrote: Hi, I'm trying to search across multiple indexes and I couldn't understand the result of the TF/TDF function. I didn't expect for the indexes where the term is more frequent to get penalized. Here follows an example: https://gist.github.com/luizgpsantos/9216108 When searching for the term alice the document {_index: index2, _type: type, _id: 1} got a score 0.8784157 while {_index: index1, _type: type, _id: 1} got a score 0.4451987. In my use case I got one index about sports and another about celebrities and when I search for a celebrity documents across sports and celebrities indexes, results from sports index tend to appear in first place due to the explanation above (we have few celebrities documents in sports index). But the point is that when searching for a celebrity I would expect results from the celebrity index. Is there any way to calculate the score not penalizing indexes where the frequency of a term is higher? Cheers, -- Luiz Guilherme P. Santos -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- Luiz Guilherme P. Santos -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGLPTbZgwyoBARjwcg9v0sUsjuxw4m_6X1iFQqO6zTHaQ%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDygVTcJwb9BcsC5_7zx5KC2q3FVWj%3DinEt1MjS%2Bp1ZZg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Relation Between Heap Size and Total Data Size
There is enough space on every machine. I looked in the logs and find out that org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/ebs/elasticsearch/elasticsearch-0.90.10/data/elasticsearch/nodes/0/indices/logstash-2014.02.26/0/index/write.lock is what causes the shard fails to start. On 02/25/2014 05:29 PM, Randy wrote: Probably low on disc on at least one machine. Monitor disc usage. Also look in the logs and find out what error you are getting. Report back. Sent from my iPhone On Feb 25, 2014, at 7:25 AM, Umutcan umut...@gamegos.com wrote: Hi, I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 is running all of them. Heap size is 6 GB for all the instances, so total heap size is 24 GB. I have 5 shard for each index and each shard has 1 replica. A new index is created for every day, so all indices have nearly same size. When total data size reaches around 100 GB (replicas are included), my cluster begins to fail to allocate some of the shards (status yellow). After I delete some old indices and restart all the nodes, everything is fine (status is green). If I do not delete some data, status eventually turns red. So, I am wondering that is there any relationship between heap size and total data size? Is there any formula to determine heap size based on data size? Thanks, Umutcan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/530D9286.60300%40gamegos.com. For more options, visit https://groups.google.com/groups/opt_out.
Histogram of high-cardinality aggregate
Hey folks, Playing around with the aggregation API, I was wondering whether this is possible. Taking the example at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html , how would I get the histogram of the minimum price [not all prices] of all the products? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fee3c6f5-1341-4590-a682-bdc7bcdc595e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Text Categorization in ES
Hi All, To be specific I want a query like : Searching for Laptop will automatically give result for Dell, Sony, HP, Lenevo, Samsung... as well. As lingo3g is used for clustering the documents so it will store the reference for above terms as well. For that I have installed Carrot2 and Lingo3g on top of ES. So what should be my query wrt lingo3g to search the specified items. Or is there anything else I have to do to make it work. -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Text-Categorization-in-ES-tp4050194p4050512.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1393399736105-4050512.post%40n3.nabble.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: EsRejectedExecutionException when searching date based indices.
You are mixing nodes and shards, right? How many elasticsearch nodes do you have to manage your 7300 shards? Why did you set 20 shards per index? You can increase the queue size in elasticsearch.yml but I'm not sure it's the right thing to do here. My 2 cents -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 26 févr. 2014 à 01:36, Alex Clark a...@bitstew.com a écrit : Hello all, I’m getting failed nodes when running searches and I’m hoping someone can point me in the right direction. I have indices created per day to store messages. The pattern is pretty straight forward: the index for January 1 is messages_20140101, for January 2 is messages_20140102 and so on. Each index is created against a template that specifies 20 shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have recently upgraded to ES 1.0. When I search for all messages in a year (either using an alias or specifying “messages_2013*”), I get many failed nodes. The reason given is: “EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”). The more often I search, the fewer failed nodes I get (probably caching in ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so the document counts coming back have to be accurate. The aggregate counts will change depending on the number of node failures. We use the Java API to create a local node to index and search the documents. However, we also see the issue if we use the URL search API on port 9200. If I restrict the search for 30 days then I do not see any failures (it’s under 1000 nodes so as expected). However, it is a pretty common use case for our customers to search messages spanning an entire year. Any suggestions on how I can prevent these failures? Thank you for your help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AD2469D8-4910-4166-91BA-D98D67860BAC%40pilato.fr. For more options, visit https://groups.google.com/groups/opt_out.