Re: how to retrieve cluster and node stats on data node when disable http (http.enabled: false)
How do you search in your cluster? Are you using Java Client? Le 22 oct. 2014 à 03:17, Terence Tung tere...@teambanjo.com a écrit : hi there, i followed the recommendation from http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html to create dedicated master, client and data nodes. for my master and data nodes, i disabled http.enabled so they will communicate via transport 9300. however, previously we were using curl localhost:9200/_cluster/stats and /_node/stats to fetch monitoring stats(e.g. heap usage, num of docs, thread counts, and etc). my question is how can i fetch these monitoring stats anymore? i searched and read thru elasticsearch doc but couldn't find it. please help. thanks and really appreciate for any help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a031652-baab-4a34-901c-a8cd5807efd4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6A004C59-3769-4107-A296-ED64B5110BDF%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: How to check the elasticsearch is available
If you are doing Java, you can do: node.client().admin().cluster().prepareHealth().setWaitForYellowStatus().execute().actionGet(); Le 22 oct. 2014 à 04:33, Weiguo Xia xia...@gmail.com a écrit : Hi, I am new to elasticsearch, I meet a some problem. I write a code that need the elasticsearch is ready to use. In my code, I run the elasticsearch first. But it need some time that elasticsearch is ready. How can I tell when the elasticsearch is ready? Thank you. WX -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a24d6d0-5186-4e04-a471-9a4718ef46f7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E992B67D-D9E5-45D4-BAF6-4EBC0DE7B03B%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Zoning vs using several clusters
Hi, everyone, In our cluster, we have several types of rather complex documents and are contemplating the possibility to separate them. As far as I know, we have two possibilities : 1) Using routing to create zones on our cluster, and indexing each type of document to its defined zone. - That would make for easier maintenance; for we would only have one cluster to take care of. - Our web appication would only need one client as well. - this allows for transversal requests (IE requests on all types of documents at once) however, - However, if our cluster suffers from the dreaded split-brain (yes, we still have it despite using proper configuration, due to network problems), all our indices and all our indexation processes are impacted, thus, all our data is at risk, and our web-app is utterly unusable. 2) Using a cluster per document type - a bit of split-brain resilience. A cluster entering split brain would only inpact its own type of document, thus lowering the overall risk. - That would allow us to close parts of our webapp, leaving others open. however - No more transversal request (we are not using them, so, not really a con, for us) - slightly more complex web app . (needs one client per cluster and we need to make sure we are using the proper one) - harder ES maintenance. (would need eyes on every ones of these clusters My understanding of ES leads me to believe that both method would be equally efficient requesting-wise and indexing-wise (tough I could be wrong). So, we wonder : _is there any other benefits using routing instead over using several cluster? _Am I right to think that using either method will be the same performance-wise? Any insight/advice on that matter would be a tremendous help. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8d771b9-8ec8-4a82-8daf-6ed1189c9e62%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Children aggregation (1.4.0.Beta1) Round-Robin result
Hi Martijn, Would you help with another question considering this topic I red that ES stores parent-child relations in a heap, could it be that this bug prevents some objects from being GC-ed, e.g. there is a memory leak? And what happens if there is no more heap but there are more parent-child relations incoming? The reason Im asking is that our cluster (8 rxlarge, etc etc) went down after 2 days updating paren-child relations. Index volume is tiny, but the number of child documents updated is huge. Thank you. Vlad On Tuesday, October 21, 2014 4:38:55 PM UTC+2, Martijn v Groningen wrote: Hi Vlad, I opened: https://github.com/elasticsearch/elasticsearch/pull/8180 Many thanks for reporting this issue! Besides this bug the parent/child model works well, so I recommend to keep it. I don't know exactly when the next 1.4 release is released, but I expect within a week or 2. Martijn On 21 October 2014 16:17, Vlad Vlaskin vl...@admoment.ru javascript: wrote: Hi Martijn, great news, thank you! Would you recommend to keep parent-child data model and wait for a release? (Do you have a feeling of the date?). Thank you Vlad On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote: Hi Vlad, I reproduced it. The children agg doesn't take documents marked as deleted into account properly. When documents are deleted they are initially marked as deleted before they're removed from the index. This also applies to updates, because that translate into an index + delete. The issue you're experiencing can also happen when not using the bulk api. It may just be a bit less likely to manifest. The fix for this bug is small. I'll open a PR soon. Martijn On 21 October 2014 15:51, Vlad Vlaskin vl...@admoment.ru wrote: Hi Martijn, Couple hours age I tried to submit a bug on ES Github issues and during creating steps of reproduce realized one more thing. *It happens only if you update the same child document within one bulk request.* Because I didn't manage to reproduce the arithmetic progression effect with curling my localhost, but it is still reproducible from java code doing bulk-update (script + upsert doc). I understand that bulk-updating the same document is a pretty ugly thing and I was surprised when it worked normally (without exceptions about version conflicts) from java client. If it might be helpful: these are the steps and queries to curl your localhost with parent-child. Unfortunately I don't know how to create a curl with bulk updates. #Create index test with parent-cild mappings curl -XPUT localhost:9200/test -d '{mappings:{root:{ properties:{country:{type:string}}},metric:{_ parent:{type:root},properties:{count:{type:long}' #Index parent document: curl -XPUT localhost:9200/test/root/1 -d '{country:de}' #Index child document: curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d '{count:1}' #Update child document: curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d '{script:ctx._source.count+=ct, params:{ct:1}}' #Query with benchmark query, it should return 2 curl -XGET localhost:9200/test/_search -d '{size:0,query:{match_ all:{}},aggs:{requests:{sum:{field:count' #Query with child aggregation query, exepected 2 curl -XGET localhost:9200/test/metric/_search -d '{size:0,query:{match_all:{}},aggs:{child:{ children:{type:metric},aggs:{requests:{sum:{ field:count}}' Thank you On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote: Hi Vlad, What you're describing shouldn't happen. The child docs should get detached. I think this is a bug. Let me verify and get back to you. Martijn On 21 October 2014 13:26, Vlad Vlaskin vl...@admoment.ru wrote: After some experiments I believe I found the cause of the discrepancy problem: *ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. * E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions: { count: 1}, { count: 2}, { count: 3}, { count: 4} Query to the child document (after refresh) shows you proper version: {count: 4} But child aggregation {sum:{field:count}} shows you 10, because: 1 + 2 +3 +4 = 10 It works pretty accurate (e.g. for 5 you have 15). It explains the behavior here. On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote: Dear ES group, we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others. We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.) With data model of: *Parent* { key: value } and a timeline with children, holding metrics: *Child* (type metrics) { day: 2014-10-20, count: 10 } We update metric documents and
Recurring Heap Problems
Hello ES group, I have had recurring heap problems (java.lang.OutOfMemoryError: Java heap space”) on my 2-nodes ES cluster (16GB RAM/node, 8GB allocated to ES) the last month and I really don’t know how to tackle them. It started at a time where I was doing aggregations on a “milliseconds since EPOCH” field, and I was given to understand that it was probably the cause of my problems since it created a very large number of buckets before aggregating them. So I stopped doing aggregations on this field (I did not delete it though). Recently I was told that my index had too few shards respective to its size (2 primary shards, 1 replica each, 100-150 Mdocs). So I decided to try reindexing into a new index with more shards (I am using es-reindex.rb, which itself uses the bulk API). But now I am having OutOfMemoryError happen during reindexing. Needless to say, once an OutOfMemoryError happens, my cluster seems to never recover until I reboot each node. It should be noted that I use ES almost exclusively with search_type=count, since I am only trying to do analytics on website data. I am not sure how to proceed from this point, I don’t know the right tool to pinpoint my memory problems and there doesn’t seem to be a way to ask ES for heap usage by index/query/task type I’d be very grateful for any advice you can offer. Thanks in advance, Vincent Bernardi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4687cf5b-34c8-4f5d-88be-f134367a888b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Scoring of queries on nested documents
After some investigation, the number of nested docs get counted individually along with the root doc. On Tuesday, October 21, 2014 4:55:56 PM UTC+1, ba...@intalex.com wrote: Thanks for the help Mark. When calculating relevance can I assume that TF is the number of times that the term appears in the collapsed nested field? I.e. all of the city names get merged into one field, or is it handled a different way? Is the Field Length Norm calculated in the same way? Barry On Tuesday, October 21, 2014 3:48:15 PM UTC+1, Mark Harwood wrote: The score_mode setting determines how the scores of the various child docs are attributed to the parent doc which is the final scored element. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html#query-dsl-nested-query You can for example choose to take the average, max or sum of all the child documents that match your nested query and reward the parent doc with that value On Tuesday, October 21, 2014 9:56:51 AM UTC+1, ba...@intalex.com wrote: Hello, I am having a problem understanding how scoring of nested documents works. I have found other people with similar questions which have remained unanswered: http://stackoverflow.com/questions/25619632/elasticsearch-how-is-the-score-for-nested-queries-computed http://stackoverflow.com/questions/26263562/elasticsearch-boost-score-with-nested-query The relevant section of my current mapping (with nested parts) is: mappings: { person: { properties: { city: { type: nested properties: { visityear: { type: integer } name: { type: string } } } } } } If I have three people who have visited different numbers of cities and I search for a common city they have all visited I get different score values. The person who visited the greatest number of cities is ranked first, with the person who visited only one city getting a score of 1 (currently ranked lowest). The output of the explanation is that hthe score is based on 'child doc range from 0 to x'. My question is how do TF, IDF and Field Norm work for nested documents when the score is being calculated? Many thanks, Barry -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8809201-3806-4a49-9b87-7eb0c2e02dc2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: copy index
Jorg, Thanks for the quick turnaround on putting in the fix. What I found when I tested is that it works for test, testcopy But when I try with myindex, myindexcopy doesn't work I noticed in the logs when I was trying myindex that it was looking for an index test which was a bit odd So I copied my myindex to an index named literally test and only then it worked So the only index that can be copied is test The target index can be anything. Logs: [2014-10-22 12:05:07,649][INFO ][KnapsackPushAction ] start of push: {mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway} [2014-10-22 12:05:07,649][INFO ][KnapsackService ] update cluster settings: plugin.knapsack.export.state - [{mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}] [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction ] map={myindex=myindexcopy} [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction ] getting settings for indices [test, myindex] [2014-10-22 12:05:07,651][INFO ][KnapsackPushAction ] found indices: [test, myindex] [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction ] getting mappings for index test and types [] [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction ] found mappings: [test] [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction ] adding mapping: test [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction ] creating index: test [2014-10-22 12:05:07,672][INFO ][KnapsackPushAction ] count=2 status=OK I guess you can put in a quick fix? I would have to ask if anyone is using this? And what are most people doing? Are there any plans by ES to create a product or does the snapshot feature suffice for most people? Again I just would repeat my requirements: I want to change the mapping types for an existing index. Therefore I create my new index and copy the old index data into the new. Thanks in advance. On Monday, October 20, 2014 8:42:48 PM UTC+1, Jörg Prante wrote: I admit there is something overcautious in the knapsack release to prevent overwriting existing data. I will add a fix that will allow writing into an empty index. https://github.com/jprante/elasticsearch-knapsack/issues/57 Jörg On Mon, Oct 20, 2014 at 6:47 PM, eune...@gmail.com javascript: wrote: By the way Es version 1.3.4 Knapsack version built with 1.3.4 Regards. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942%40googlegroups.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: copy index
Yes, I can put up a fix - looks weird. Most users have either a constant mapping that can extend dynamically, or does not change on existing field. If fields have to change for future documents, you can also change mapping by using alias technique: - old index with old fields (no change) - new index created with changed fields - assigning an index alias to both indices - search on index alias No copy required. Jörg On Wed, Oct 22, 2014 at 1:27 PM, euneve...@gmail.com wrote: Jorg, Thanks for the quick turnaround on putting in the fix. What I found when I tested is that it works for test, testcopy But when I try with myindex, myindexcopy doesn't work I noticed in the logs when I was trying myindex that it was looking for an index test which was a bit odd So I copied my myindex to an index named literally test and only then it worked So the only index that can be copied is test The target index can be anything. Logs: [2014-10-22 12:05:07,649][INFO ][KnapsackPushAction ] start of push: {mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway} [2014-10-22 12:05:07,649][INFO ][KnapsackService ] update cluster settings: plugin.knapsack.export.state - [{mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}] [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction ] map={myindex=myindexcopy} [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction ] getting settings for indices [test, myindex] [2014-10-22 12:05:07,651][INFO ][KnapsackPushAction ] found indices: [test, myindex] [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction ] getting mappings for index test and types [] [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction ] found mappings: [test] [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction ] adding mapping: test [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction ] creating index: test [2014-10-22 12:05:07,672][INFO ][KnapsackPushAction ] count=2 status=OK I guess you can put in a quick fix? I would have to ask if anyone is using this? And what are most people doing? Are there any plans by ES to create a product or does the snapshot feature suffice for most people? Again I just would repeat my requirements: I want to change the mapping types for an existing index. Therefore I create my new index and copy the old index data into the new. Thanks in advance. On Monday, October 20, 2014 8:42:48 PM UTC+1, Jörg Prante wrote: I admit there is something overcautious in the knapsack release to prevent overwriting existing data. I will add a fix that will allow writing into an empty index. https://github.com/jprante/elasticsearch-knapsack/issues/57 Jörg On Mon, Oct 20, 2014 at 6:47 PM, eune...@gmail.com wrote: By the way Es version 1.3.4 Knapsack version built with 1.3.4 Regards. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942% 40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFR_0-%3DOt%3DsY4Y4tt%3D0quh8-%3D7zEBVjAHAKZGppkAuRFA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: copy index
I think you have to set up such a curl command like this curl -XPOST 'localhost:9200/yourindex/_push?map=\{yourindex:yournewindex\}' to push the index yourindex to another one. Note the endpoint. How does your curl look like? Jörg On Wed, Oct 22, 2014 at 1:27 PM, euneve...@gmail.com wrote: Jorg, Thanks for the quick turnaround on putting in the fix. What I found when I tested is that it works for test, testcopy But when I try with myindex, myindexcopy doesn't work I noticed in the logs when I was trying myindex that it was looking for an index test which was a bit odd So I copied my myindex to an index named literally test and only then it worked So the only index that can be copied is test The target index can be anything. Logs: [2014-10-22 12:05:07,649][INFO ][KnapsackPushAction ] start of push: {mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway} [2014-10-22 12:05:07,649][INFO ][KnapsackService ] update cluster settings: plugin.knapsack.export.state - [{mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}] [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction ] map={myindex=myindexcopy} [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction ] getting settings for indices [test, myindex] [2014-10-22 12:05:07,651][INFO ][KnapsackPushAction ] found indices: [test, myindex] [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction ] getting mappings for index test and types [] [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction ] found mappings: [test] [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction ] adding mapping: test [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction ] creating index: test [2014-10-22 12:05:07,672][INFO ][KnapsackPushAction ] count=2 status=OK I guess you can put in a quick fix? I would have to ask if anyone is using this? And what are most people doing? Are there any plans by ES to create a product or does the snapshot feature suffice for most people? Again I just would repeat my requirements: I want to change the mapping types for an existing index. Therefore I create my new index and copy the old index data into the new. Thanks in advance. On Monday, October 20, 2014 8:42:48 PM UTC+1, Jörg Prante wrote: I admit there is something overcautious in the knapsack release to prevent overwriting existing data. I will add a fix that will allow writing into an empty index. https://github.com/jprante/elasticsearch-knapsack/issues/57 Jörg On Mon, Oct 20, 2014 at 6:47 PM, eune...@gmail.com wrote: By the way Es version 1.3.4 Knapsack version built with 1.3.4 Regards. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942% 40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH5G4xZxCTHVK-jTjKidMUKOpyNpjwvx-PzQ5xcK2SVZA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: copy index
Hey Jorg, Correct. Whew! If I run just curl -XPOST 'localhost:9200/_push?map=\{myindex:myindexcopy\}' it works fine. By the way : is there any way to make this work in sense eg POST /_push?map=\{myindex:myindexcopy\} POST /_push { map: { myindex:myindexcopy } } The second one will submit in sense but results in empty map={} And is there any plan to put a gui around it? Aside: I still see these errors in the ES logs [2014-10-22 13:46:25,736][INFO ][client.transport ] [Astronomer] failed to get local cluster state for [#transport#-2][HDQWK037][inet[/10.193 org.elasticsearch.transport.RemoteTransportException: [Abigail Brand][inet[/10.193.5.155:9301]][cluster/state] Caused by: org.elasticsearch.transport.RemoteTransportException: [Abigail Brand][inet[/10.193.5.155:9301]][cluster/state] Caused by: java.lang.IndexOutOfBoundsException: Readable byte limit exceeded: 48 at org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236) at org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132) at org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141) at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272) at org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61) at org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362) at org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132) at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209) at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Wednesday, October 22, 2014 1:27:59 PM UTC+1, Jörg Prante wrote: I think you have to set up such a curl command like this curl -XPOST 'localhost:9200/yourindex/_push?map=\{yourindex:yournewindex\}' to push the index yourindex to another one. Note the endpoint. How does your curl look like? Jörg On Wed, Oct 22, 2014 at 1:27 PM,
System integrations
Hi everyone, I am looking into the possibility to retrieve data of development tools and display some correlations in the data. To do this I first need to index the following systems: 1. Jira 2. Git 3. Sonar 4. Jenkins I have found connections for Jira and GIT. These connections are in the form of rivers. https://github.com/searchisko/elasticsearch-river-jira https://github.com/obazoud/elasticsearch-river-git Does anyone knows existing integrations for Sonar or Jenkins. Thanks, Roel -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f6b74f21-edb7-48d4-887a-5ce7ef442e97%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: copy index
I can not use the HTTP request body because this is reserved for a search request like in the _search endpoint. So you can push a part of the index to a new index (the search hits). The message failed to get local cluster state for is on INFO level, so I think it is not an error. A GUI is a long term project in another context, good for the whole community. I am unsure how to develop a replacement for the sense plugin. Maybe a firefox plugin will arrive some time. I don't know. Jörg On Wed, Oct 22, 2014 at 3:21 PM, euneve...@gmail.com wrote: Hey Jorg, Correct. Whew! If I run just curl -XPOST 'localhost:9200/_push?map=\{myindex: myindexcopy\}' it works fine. By the way : is there any way to make this work in sense eg POST /_push?map=\{myindex:myindexcopy\} POST /_push { map: { myindex:myindexcopy } } The second one will submit in sense but results in empty map={} And is there any plan to put a gui around it? Aside: I still see these errors in the ES logs [2014-10-22 13:46:25,736][INFO ][client.transport ] [Astronomer] failed to get local cluster state for [#transport#-2][HDQWK037][inet[/10.193 org.elasticsearch.transport.RemoteTransportException: [Abigail Brand][inet[/10.193.5.155:9301]][cluster/state] Caused by: org.elasticsearch.transport.RemoteTransportException: [Abigail Brand][inet[/10.193.5.155:9301]][cluster/state] Caused by: java.lang.IndexOutOfBoundsException: Readable byte limit exceeded: 48 at org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236) at org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132) at org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141) at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272) at org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61) at org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362) at org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132) at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209) at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at
how to use es analyzer for compound words?
here is an example for an index with some documents containing dutch compound words. https://gist.github.com/herrvonb/0a247aa7dfd0d155b418 plaatstaal is a dutch compound word. So after the custom analyzer dutch has been assigned to the field test I expected a search for plaat would return at least one hit. That's what i get: search for plaat { took : 2, timed_out : false, _shards : { total : 1, successful : 1, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } Any idea why this is not working? Thanks for your help -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f61d2c7-0385-4a9c-adb0-600c2f442e58%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Shard Configuration via Puppet Module
Hi, Does the Puppet module allows configuration of the shards for an index? I have Logstash sending data to Elasticsearch and the default of 5 shards is set; can I change this via Puppet? Or, can I set the shards and replicas in the Logstash conf file? Thank you, -Jose Andres -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e62dbda8-d825-4e21-b9ea-5277cac25ca9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Reading Epoch As @timestamp
Antonio's example works. My problem was a syntax issue as the Logstash docs do not really have examples. I was not able to figure out the formatting. On Wednesday, October 22, 2014 1:08:33 AM UTC-4, vineeth mohan wrote: Hello Antonio , I am aware of this. The example you have quoted should actually work. Why do you feel that its not working. Thanks Vineeth On Tue, Oct 21, 2014 at 7:38 PM, Antonio Augusto Santos mkh...@gmail.com javascript: wrote: If you are using logstash to push your events do ES you need something like this: date { match = [ field_with_the_epoch, UNIX ] } Read more about it here: http://logstash.net/docs/1.4.2/filters/date On Tuesday, October 21, 2014 8:43:08 AM UTC-3, ES USER wrote: For the life of me my Google searching has not revealed any solution to this at least none that work for me. I have log data with an Epoch timestamp in it and would like to use the date filter in Logstash to overwrite @timestamp with the appropriate converted timestamp derived from that epoch. Any insight on this would be much appreciated. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ee39dee4-b113-4fcf-80d6-4d4e7063afc9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ee39dee4-b113-4fcf-80d6-4d4e7063afc9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cfd1699c-f41d-4501-a931-8887a9bbb585%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Is it possible to access a variable inside the include area?
Hi everyone, How can i use a variable inside include area? is it possible? how? I'm using groovy script in my query and it looks like: GET /my_index/my_type/_search { _source: false, query: { bool: { ... } }, aggs: { group: { nested: { path: my_path }, aggs: { path_id: { terms: { field: my_path.id, lang: groovy, script: def *myVar* = (_value.split('.').findIndexOf { it == '3238175' }+1); *myVar* + '/' + _value; , include: *myVar*/.*, size: 0 } } } } } } The variable myVar is working inside script. Is it possible to use it inside include? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/972dc840-28eb-44c7-be9b-4bb868e660d8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to use es analyzer for compound words?
For decompounding, you need a more sophisticated algorithm, like in my plugin https://github.com/jprante/elasticsearch-analysis-decompound which provides decompounding for german words. Jörg On Wed, Oct 22, 2014 at 4:09 PM, sebninse sebni...@gmail.com wrote: here is an example for an index with some documents containing dutch compound words. https://gist.github.com/herrvonb/0a247aa7dfd0d155b418 plaatstaal is a dutch compound word. So after the custom analyzer dutch has been assigned to the field test I expected a search for plaat would return at least one hit. That's what i get: search for plaat { took : 2, timed_out : false, _shards : { total : 1, successful : 1, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] } } Any idea why this is not working? Thanks for your help -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f61d2c7-0385-4a9c-adb0-600c2f442e58%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5f61d2c7-0385-4a9c-adb0-600c2f442e58%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGVKKrxKoZ6JKaACj-dHVuXF9m-3H8hJkAwD0ZaP9dubA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
[hadoop][pig] Using ES UDF to connect over HTTPS through Apache to ES
Hi, I am trying to configure a system to use both Basic Authentication and HTTPS to Store data to ElasticSearch. My system is configured with a Pig script running through Hadoop to connect to Apache (configured as a proxy) to forward the request to ElasticSearch. Using simple HTTP and Basic Authentication works correctly. However, when I try to force my ES UDF to use HTTPS, I get errors in my Apache logs and my job fails. The relevant snippet of my Pig script is below: *REGISTER /bigdata/cloudera/ES_HadoopJar/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar* *DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage(* *'es.nodes=https://127.0.0.1:28443',* *'es.net.proxy.http.host=https://127.0.0.1',* *'es.net.proxy.http.port=28443',* *'es.net.proxy.http.user=myuser',* *'es.net.proxy.http.pass=mypass',* *'es.http.retries=10');* *data = LOAD... ...* *STORE data INTO 'my_data_index/data' USING EsStorage;* The error output to the Apache log is as follows: *SSL Library Error: error:1407609B:SSL routines:SSL23_GET_CLIENT_HELLO:https proxy request -- speaking HTTP to HTTPS port!?* The error/stacktrace from my Map job is as follows: *Error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed;* *tried [[https://127.0.0.1:28443]] at* *org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123) at * *org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300) at * *org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:284) at * *org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:288) at * *org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:117) at * *org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:99) at * *org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:59) at * *org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:180) at * *org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:157) at * *org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:196) at * *org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) at * *org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) at * *org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) at * *org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at * *org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at * *org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at * *org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at * *org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at * *org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at * *org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at * *org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at * *org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at * *org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at * *java.security.AccessController.doPrivileged(Native Method) at * *javax.security.auth.Subject.doAs(Subject.java:415) at * *org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at * *org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)* So my question is, is this possible (i.e. can it work)? And if so, where am I going wrong? Thanks in advance for any help. Aidan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is it possible to access a variable inside the include area?
I have solved with RegEx GET /my_index/my_type/_search { _source: false, query: { bool: { ... } }, aggs: { group: { nested: { path: my_path }, aggs: { path_id: { terms: { field: my_path.id, lang: groovy, script: def *myVar* = (_value.split('.').findIndexOf { it == '3238175' }+1); *myVar* + '/' + _value; , include: *[1-4]*/.*, size: 0 } } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d943e0e8-066c-4db0-89e0-75d8fbaf34e5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Question about Elasticsearch and Spark
Hi: I have a very simple application that queries an ES instance and returns the count of documents found by the query. I am using the Spark interface as I intend to do run ML algorithms on the result set. With that said here are the problems I face : 1. If I set up the Configuration(to use in the newAPIHadoopRDD) or JobCnf (to use with hadoopRDD), using a remote ES instance like so : This is using the new APIHadoopRDD interface val sparkConf = new SparkConf().setMaster(local[2]).setAppname(TestESSpark) sparkConf.set(spark.serializer,classOf[KyroSerializer].getName) val sc = new SparkContext(sparkConf) val conf = new Configuration // change to new JobConf for the old API conf.set(es.nodes,remote.server:port) conf.set(es.resources,index/type) conf.set(es.query,{\query\:{\match_all\:{}}) val esRDD = sc.newAPIHadoopRDD(conf,classOf[EsInputFormat[Text,MapWritable]],classOf[Text],classOf[MapWritable]) // change to hadoopRDD for the old API val docCount = esRDD.count println(docCount) The application just hangs at the println. //((basically executing the search or so I think). 2. If I use localhost instead of remote.server:port for the es.nodes, the application throws an exception : Exception in thread main org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]] at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:303) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:287) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:291) at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:118) at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:100) at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:57) at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220) at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:406) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135) at org.apache.spark.rdd.RDD.count(RDD.scala:904) at trgr.rd.newsplus.pairgen.ElasticSparkTest1$.main(ElasticSparkTest1.scala:59) at trgr.rd.newsplus.pairgen.ElasticSparkTest1.main(ElasticSparkTest1.scala) I am using the 2.1.0.Beta2 version of the elasticsearch-hadoop library. and running it against a local instance ES version 1.3.2/remote instance ES version 1.0.0 Any insight as to what I might be missing/doing wrong ? Thanks Ramdev -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b42a015-9f39-4a38-963f-f75e7141547a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to retrieve cluster and node stats on data node when disable http (http.enabled: false)
the search is still using HTTP API, we have an ELB and 3 dedicated client nodes behind ELB, so all search request will go thru that ELB via HTTP. so monitoring stats on client node doesn't have problem, the problem is i cannot do curl localhost:9200 on the dedicated master and data nodes. On Tuesday, October 21, 2014 11:13:37 PM UTC-7, David Pilato wrote: How do you search in your cluster? Are you using Java Client? Le 22 oct. 2014 à 03:17, Terence Tung ter...@teambanjo.com javascript: a écrit : hi there, i followed the recommendation from http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html to create dedicated master, client and data nodes. for my master and data nodes, i disabled http.enabled so they will communicate via transport 9300. however, previously we were using curl localhost:9200/_cluster/stats and /_node/stats to fetch monitoring stats(e.g. heap usage, num of docs, thread counts, and etc). my question is how can i fetch these monitoring stats anymore? i searched and read thru elasticsearch doc but couldn't find it. please help. thanks and really appreciate for any help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a031652-baab-4a34-901c-a8cd5807efd4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2a031652-baab-4a34-901c-a8cd5807efd4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/514cce92-3c5f-4c23-a825-6105f89d18a5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: allow_explicit_index and _bulk
This issue looks to be fixed on https://github.com/elasticsearch/elasticsearch/issues/4668 However, on elasticsearch-1.3.4, running the example with rest.action.multi.allow_explicit_index: false: ``` POST /foo/bar/_bulk { index: {} } { _id : 1, baz: foobar } ``` I am getting the exception: ``` { took: 1, errors: true, items: [ { create: { _index: foo, _type: bar, _id: oX0Xp8dzRbySZiKX8QI0zw, status: 400, error: MapperParsingException[failed to parse [_id]]; nested: MapperParsingException[Provided id [oX0Xp8dzRbySZiKX8QI0zw] does not match the content one [1]]; } } ] } ``` Am I doing something wrong or something has changed? Il giorno giovedì 9 gennaio 2014 15:38:46 UTC, Gabe Gorelick-Feldman ha scritto: Opened an issue: https://github.com/elasticsearch/elasticsearch/issues/4668 On Thursday, January 9, 2014 3:39:39 AM UTC-5, Alexander Reelsen wrote: Hey, after having a very quick look, it looks like a bug (or wrong documentation, need to check further). Can you create a github issue? Thanks! --Alex On Wed, Jan 8, 2014 at 11:08 PM, Gabe Gorelick-Feldman gabego...@gmail.com wrote: The documentation on URL-based access control http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/url-access-control.html implies that _bulk still works if you set rest.action.multi.allow_explicit_index: false, as long as you specify the index in the URL. However, I can't get it to work. POST /foo/bar/_bulk { index: {} } { _id: 1234, baz: foobar } returns explicit index in bulk is not allowed Should this work? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0d1fa2f-0c28-4142-9f6d-4b28a1695bb3%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9aff019-33c0-4743-9e14-fe3913bcda1c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Limit bug or Limit misunderstanding?
I have a query that I want to return only one document. Basically, I want to do an existence check on a document with a given term filter. I am executing: POST profiles/profile/_search { query: { filtered: { filter: { bool: { must: [ { limit: { value: 1 } }, { term: { profile_id: salinger-23145 } } ] } } } } } The profiles/profile mapping has tens of millions of documents in it, two of which match the given terms query (when the limit is removed entirely). When I execute the query, I get zero results back. However, If I change the limit value to two (2) then one (1) result is returned. If I change the limit value to three (3) then two (2) results are returned. It's almost like there is an off by one error in limit. So am I: 1) Writing the query wrong? I tried placing the limit outside of the must, bool, and filter clauses. That caused errors in each case. But I may have just done something silly. 2) Misunderstanding limit? My understanding of limit is that it returns no more than x documents per shard. Given that I have five shards and at least two documents matching the query, I should be returning between one and five documents. However, looking at the limit documentation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html I suspect that I may be misunderstanding how limit works. The wording to execute on leads me to believe that it may only be selecting ONE document against which the term filter is run. Thus, if the one document that it tests doesn't match, it returns zero results. However, the limit 2 returning one document leads me to believe that my original understanding is correct. 3) Staring at an elasticsearch limit bug? Unfortunately I have been unable to reproduce the error after creating test indexes and mappings. The limit behaves exactly as I expect in every other case. 4) Doing something else that is equally silly? Any help or suggestions is appreciated. Can I provide any clarifications? Thanks, .jpg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Limit bug or Limit misunderstanding?
limit is not a limit for response size. It sets a shard limit which is quite low level, so the resources per shard of ES are not so much under pressure. If the sum of the limits on the shards matches the total length of the response is not guaranteed. The limit parameter for the response is the size parameter. Can you try POST profiles/profile/_search { size : 1, query: { constant_score : { filter : { term: { profile_id: salinger-23145 } } } } } and see if this works better? If you want to perform a true existence check of a doc, you should use the doc _id and a head request, something like HEAD profiles/profile/id which is faster than a search. Jörg On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt jeff.ga...@gmail.com wrote: I have a query that I want to return only one document. Basically, I want to do an existence check on a document with a given term filter. I am executing: POST profiles/profile/_search { query: { filtered: { filter: { bool: { must: [ { limit: { value: 1 } }, { term: { profile_id: salinger-23145 } } ] } } } } } The profiles/profile mapping has tens of millions of documents in it, two of which match the given terms query (when the limit is removed entirely). When I execute the query, I get zero results back. However, If I change the limit value to two (2) then one (1) result is returned. If I change the limit value to three (3) then two (2) results are returned. It's almost like there is an off by one error in limit. So am I: 1) Writing the query wrong? I tried placing the limit outside of the must, bool, and filter clauses. That caused errors in each case. But I may have just done something silly. 2) Misunderstanding limit? My understanding of limit is that it returns no more than x documents per shard. Given that I have five shards and at least two documents matching the query, I should be returning between one and five documents. However, looking at the limit documentation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html I suspect that I may be misunderstanding how limit works. The wording to execute on leads me to believe that it may only be selecting ONE document against which the term filter is run. Thus, if the one document that it tests doesn't match, it returns zero results. However, the limit 2 returning one document leads me to believe that my original understanding is correct. 3) Staring at an elasticsearch limit bug? Unfortunately I have been unable to reproduce the error after creating test indexes and mappings. The limit behaves exactly as I expect in every other case. 4) Doing something else that is equally silly? Any help or suggestions is appreciated. Can I provide any clarifications? Thanks, .jpg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG1iiuwKQcysvh%2BBVtLGeEPrj89F%3DR4syTVRBt-bru9oQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Question about Elasticsearch and Spark
An Update on this : The Exception mentioned in Item 2 in my original post was due to the ES instance being down (and for some reason I failed to realise that). That said, I am still having trouble with problem Item 1. Following questions came up : 1. Is there a correlation between the number of shards/replication on the ES instance to the number of shard-splits that are crated in the query request ? And 2. if the ES instance is on a single shard and has a fairly large number of documents, Would the performance be slower ? 3. Is there any network latency issues ? (I am able to query the instance using the sense/head plugins, and the response time is not bad its approximately 28ms) the reason for question 1. is because of the following : 6738 [main] INFO org.elasticsearch.hadoop.mr.EsInputFormat - Created [2] shard-splits 6780 [main] INFO org.apache.spark.SparkContext - Starting job: count at ElasticSparkTest1.scala:59 6801 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Got job 0 (count at ElasticSparkTest1.scala:59) with 2 output partitions (allowLocal=false) 6802 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Final stage: Stage 0(count at ElasticSparkTest1.scala:59) 6802 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Parents of final stage: List() 6808 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Missing parents: List() 6818 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Submitting Stage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ElasticSparkTest1.scala:57), which has no missing parents 6853 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.storage.MemoryStore - ensureFreeSpace(1568) called with curMem=34372, maxMem=503344005 6854 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.storage.MemoryStore - Block broadcast_1 stored as values in memory (estimated size 1568.0 B, free 480.0 MB) 6870 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Submitting 2 missing tasks from Stage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ElasticSparkTest1.scala:57) 6872 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Adding task set 0.0 with 2 tasks 6912 [sparkDriver-akka.actor.default-dispatcher-2] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 18521 bytes) 6917 [sparkDriver-akka.actor.default-dispatcher-2] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 18521 bytes) 6923 [Executor task launch worker-0] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 0.0 (TID 0) 6923 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Running task 1.0 in stage 0.0 (TID 1) 6958 [Executor task launch worker-0] INFO org.apache.spark.rdd.NewHadoopRDD - Input split: ShardInputSplit [node=[ZIbTPE4FSxigrYkomftWQw/Strobe|192.189.224.80:9600],shard=1] 6958 [Executor task launch worker-1] INFO org.apache.spark.rdd.NewHadoopRDD - Input split: ShardInputSplit [node=[ZIbTPE4FSxigrYkomftWQw/Strobe|192.189.224.80:9600],shard=0] 6998 [Executor task launch worker-0] WARN org.elasticsearch.hadoop.mr.EsInputFormat - Cannot determine task id... 6998 [Executor task launch worker-1] WARN org.elasticsearch.hadoop.mr.EsInputFormat - Cannot determine task id... I noticed only two shard-splits being created. On the other hand when I run the application on localhost with default settings, this is what I get : 4960 [main] INFO org.elasticsearch.hadoop.mr.EsInputFormat - Created [5] shard-splits 5002 [main] INFO org.apache.spark.SparkContext - Starting job: count at ElasticSparkTest1.scala:59 5022 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Got job 0 (count at ElasticSparkTest1.scala:59) with 5 output partitions (allowLocal=false) 5023 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Final stage: Stage 0(count at ElasticSparkTest1.scala:59) 5023 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Parents of final stage: List() 5030 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Missing parents: List() 5040 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.scheduler.DAGScheduler - Submitting Stage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ElasticSparkTest1.scala:57), which has no missing parents 5075 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.storage.MemoryStore - ensureFreeSpace(1568) called with curMem=34340, maxMem=511377408 5076
CorruptIndexException when trying to replicate one shard of a new index
Created and populated a new index on a 1.3.1 cluster. Primary shards work fine. Updated the index to create several replicas, and three of the four shards replicated, but one shard fails to replicate on any node with the following error (abbreviated some of the hashes for readability): [2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME] [INDEXNAME][2] failed engine [corrupted preexisting index] [2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME] [INDEXNAME][2] failed to start shard org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))] at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343) at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, message [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) The index is stuck now in a state where the shards try to replicate on one set of nodes, hit this failure, and then switch to try to replicate on a different set of nodes. Have been looking around to see if anyone's encountered a similar issue but haven't found anything useful yet. Anybody know if this is recoverable or if I should just scrap it and try building a new one? - Nate -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Limit bug or Limit misunderstanding?
I realize limit is not a limit for response size. I'm actually ok with getting more than one result. I'm actually not relying on limit for a size. I often use size in conjunction with limit. I'll do this when I really don't care how many items I get back, as long as it is within a range. But I implement the limit to help decrease the load on the shards. That said, I need to understand what expectations I can have around limit. Is it completely non-deterministic? Or can I have reasonable expectations about it? I will propose an example and describe my expectations: Node setup: 1 index 1 mapping 5 shards 1,000,000 documents sharded across the 5 shards 1000 matching documents sharded across the 5 shards let's assume normal distribution of the matching documents: 200 documents per shard. I realize this is not realistic to get an exact distribution like this. If I place a limit of 5 on the query, I expect 25 documents back. That is, I get 5 documents from each node. I expect this because I have at least 5 matching documents per shard. In fact, I have many more than 5 matching documents per shard. But I expect the limit to return five documents from each shard. Now I realize there are lots of real world circumstance that would cause the query to return fewer than 25 documents. Let's ignore those for the time being and remain under the assumption that the distribution is even. Now, if I place a limit of 1 on the query, I expect 5 documents back. Are these two expectations correct? Now let's assume a worst case scenario: all of the matching documents are on one shard. A limit of 5 should still return 5 documents. A limit of 1 should return 1 document. If these expectations are true, then my original scenario is valid and a limit of 1 should still return 1 document. So are these expectations valid? Or is limit completely non-deterministic? Size does work, but if I can improve performance with a limit, I would like to do so. It is possible that I have tens of thousands of matching documents, and limit could be an excellent short-circuit. Basically I want the shard to stop searching as soon as it has found one document. Also, I don't have the document _id so I cannot make the HEAD call. Do these clarifications help? On Wednesday, October 22, 2014 3:57:25 PM UTC-4, Jörg Prante wrote: limit is not a limit for response size. It sets a shard limit which is quite low level, so the resources per shard of ES are not so much under pressure. If the sum of the limits on the shards matches the total length of the response is not guaranteed. The limit parameter for the response is the size parameter. Can you try POST profiles/profile/_search { size : 1, query: { constant_score : { filter : { term: { profile_id: salinger-23145 } } } } } and see if this works better? If you want to perform a true existence check of a doc, you should use the doc _id and a head request, something like HEAD profiles/profile/id which is faster than a search. Jörg On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt jeff@gmail.com javascript: wrote: I have a query that I want to return only one document. Basically, I want to do an existence check on a document with a given term filter. I am executing: POST profiles/profile/_search { query: { filtered: { filter: { bool: { must: [ { limit: { value: 1 } }, { term: { profile_id: salinger-23145 } } ] } } } } } The profiles/profile mapping has tens of millions of documents in it, two of which match the given terms query (when the limit is removed entirely). When I execute the query, I get zero results back. However, If I change the limit value to two (2) then one (1) result is returned. If I change the limit value to three (3) then two (2) results are returned. It's almost like there is an off by one error in limit. So am I: 1) Writing the query wrong? I tried placing the limit outside of the must, bool, and filter clauses. That caused errors in each case. But I may have just done something silly. 2) Misunderstanding limit? My understanding of limit is that it returns no more than x documents per shard. Given that I have five shards and at least two documents matching the query, I should be returning between one and five documents. However, looking at the limit documentation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html I suspect that I may be misunderstanding how limit works. The wording to execute on leads me to believe that it may only be selecting ONE document against which the term filter is run. Thus, if the one document that it tests doesn't match, it returns zero
Re: Limit bug or Limit misunderstanding?
I realize limit is not a limit for response size. I'm actually ok with getting more than one result. I'm actually not relying on limit for a size. I often use size in conjunction with limit. I'll do this when I really don't care how many items I get back, as long as it is within a range. But I implement the limit to help decrease the load on the shards. That said, I need to understand what expectations I can have around limit. Is it completely non-deterministic? Or can I have reasonable expectations about it? I will propose an example and describe my expectations: Node setup: 1 index 1 mapping 5 shards 1,000,000 documents sharded across the 5 shards 1000 matching documents sharded across the 5 shards let's assume normal distribution of the matching documents: 200 documents per shard. I realize this is not realistic to get an exact distribution like this. If I place a limit of 5 on the query, I expect 25 documents back. That is, I get 5 documents from each node. I expect this because I have at least 5 matching documents per shard. In fact, I have many more than 5 matching documents per shard. But I expect the limit to return five documents from each shard. Now I realize there are lots of real world circumstance that would cause the query to return fewer than 25 documents. Let's ignore those for the time being and remain under the assumption that the distribution is even. Now, if I place a limit of 1 on the query, I expect 5 documents back. Are these two expectations correct? Now let's assume a worst case scenario: all of the matching documents are on one shard. A limit of 5 should still return 5 documents. A limit of 1 should return 1 document. If these expectations are true, then my original scenario is valid and a limit of 1 should still return 1 document. So are these expectations valid? Or is limit completely non-deterministic? Size does work, but if I can improve performance with a limit, I would like to do so. It is possible that I have tens of thousands of matching documents, and limit could be an excellent short-circuit. Basically I want the shard to stop searching as soon as it has found one document. Also, I don't have the document _id so I cannot make the HEAD call. Do these clarifications help? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7dd91dd3-bec2-48d5-97b6-334fe10e3cb1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [hadoop][pig] Using ES UDF to connect over HTTPS through Apache to ES
That's because currently, es-hadoop does not support SSL (and thus HTTPS). There are plans to make this happen in 2.1 but we are not there yet. In the meantime I suggest trying to use either an HTTP proxy or an HTTP-to-HTTPS proxy. Cheers, On 10/22/14 7:11 PM, Aidan Higgins wrote: Hi, I am trying to configure a system to use both Basic Authentication and HTTPS to Store data to ElasticSearch. My system is configured with a Pig script running through Hadoop to connect to Apache (configured as a proxy) to forward the request to ElasticSearch. Using simple HTTP and Basic Authentication works correctly. However, when I try to force my ES UDF to use HTTPS, I get errors in my Apache logs and my job fails. The relevant snippet of my Pig script is below: /REGISTER /bigdata/cloudera/ES_HadoopJar/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar/ / / /DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage(/ /'es.nodes=https://127.0.0.1:28443',/ /'es.net.proxy.http.host=https://127.0.0.1',/ /'es.net.proxy.http.port=28443',/ /'es.net.proxy.http.user=myuser',/ /'es.net.proxy.http.pass=mypass',/ /'es.http.retries=10');/ / / / / /data = LOAD... .../ / / /STORE data INTO 'my_data_index/data' USING EsStorage;/ The error output to the Apache log is as follows: *SSL Library Error: error:1407609B:SSL routines:SSL23_GET_CLIENT_HELLO:https proxy request -- speaking HTTP to HTTPS port!?* The error/stacktrace from my Map job is as follows: /Error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed;/ /tried [[https://127.0.0.1:28443]] at/ /org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123) at / /org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300) at / /org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:284) at / /org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:288) at / /org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:117) at / /org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:99) at / /org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:59) at / /org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:180) at / /org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:157) at / /org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:196) at / /org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) at / /org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) at / /org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) at / /org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at / /org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at / /org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at / /org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at / /org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at / /org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at / /org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at / /org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at / /org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at / /org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at / /java.security.AccessController.doPrivileged(Native Method) at / /javax.security.auth.Subject.doAs(Subject.java:415) at / /org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at / /org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)/ So my question is, is this possible (i.e. can it work)? And if so, where am I going wrong? Thanks in advance for any help. Aidan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email
How to collect docs in order in collect() method of a custom aggregator
I am developing a custom aggregator using ES 1.3.4. It extends from NumericMetricsAggregator.MultiValue class. Its code structure closely resembles that of the Stats aggregator. For my requirements, I need the doc Ids to be received in ascending order in the overridden collect() method. For most queries, I do get the doc Ids in ascending order. Interestingly for bool should queries with multiple clauses, I get doc Ids in descending order. How can I fix this? Is this a bug? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f3fddb08-ef63-4378-8aa2-ea709612bbd0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: CorruptIndexException when trying to replicate one shard of a new index
Can you try the workaround mentioned here: http://www.elasticsearch.org/blog/elasticsearch-1-3-2-released/ and see if it works? If the compression issue is the problem, you can re-enable compression, just upgrade to at least 1.3.2 which has the fix. On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfolk...@foursquare.com wrote: Created and populated a new index on a 1.3.1 cluster. Primary shards work fine. Updated the index to create several replicas, and three of the four shards replicated, but one shard fails to replicate on any node with the following error (abbreviated some of the hashes for readability): [2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME] [INDEXNAME][2] failed engine [corrupted preexisting index] [2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME] [INDEXNAME][2] failed to start shard org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))] at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343) at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, message [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) The index is stuck now in a state where the shards try to replicate on one set of nodes, hit this failure, and then switch to try to replicate on a different set of nodes. Have been looking around to see if anyone's encountered a similar issue but haven't found anything useful yet. Anybody know if this is recoverable or if I should just scrap it and try building a new one? - Nate -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZVEaeNXW%3DH6%2Bczq2M1s7Xf5g1quabGa749M8BZYMUfe%3Dg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
stop phrase removal or remove phrases from search query
Does elasticsearch handle stop phrase removal ? I would like to remove some phrases (only if they appear in that order) from the search query. Currently i am trying to do this only on the search side. I tried it as follows, but it didnt work curl -XPUT 'localhost:9200/designs_v1/_settings' -d ' { analysis: { filter: { shingle_omit_unigrams: { type: shingle, max_shingle_size: 3, output_unigrams: false }, my_stop: { type: stop, stopwords: [walt disney, magic kingdom, disney, kingdom] } }, analyzer: { shingle: { type: custom, tokenizer: standard, filter: [lowercase, my_stop, kstem, shingle_omit_unigrams] } } } } ' Does anyone know whether this feature is supported in elasticsearch ? Thanks Srini -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8bc84b82-45fc-47fc-8c3b-5f809120033b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Limit bug or Limit misunderstanding?
I am not sure why you are after limit. It is not a size parameter and it does not work as you expect. There is no guarantee for 5 shards and limit = 5 that you can always obtain 25 docs. For filters, Elasticsearch has added some Lucene extensions regarding the iteration of doc sets. One extension is the LimitFilter. Lucene uses doc IDs for enumerating docs in the index reader contexts and the IDs are unordered but they are non-decreasing. There can be many segments on a shard, each segment carries such a doc ID sequence. On a shard, Elasticsearch iterates through the matching docs of a filter when applying a LimitFilter, and this iteration can be short-cut by setting a limit for this iteration. The price to pay is that parts of the matched docs in the filter may be dropped. Most users do not want that, this is a very advanced setting. This is not non-deterministic, it is just very low level. Jörg On Wed, Oct 22, 2014 at 11:07 PM, Jeff Gandt jeff.ga...@gmail.com wrote: I realize limit is not a limit for response size. I'm actually ok with getting more than one result. I'm actually not relying on limit for a size. I often use size in conjunction with limit. I'll do this when I really don't care how many items I get back, as long as it is within a range. But I implement the limit to help decrease the load on the shards. That said, I need to understand what expectations I can have around limit. Is it completely non-deterministic? Or can I have reasonable expectations about it? I will propose an example and describe my expectations: Node setup: 1 index 1 mapping 5 shards 1,000,000 documents sharded across the 5 shards 1000 matching documents sharded across the 5 shards let's assume normal distribution of the matching documents: 200 documents per shard. I realize this is not realistic to get an exact distribution like this. If I place a limit of 5 on the query, I expect 25 documents back. That is, I get 5 documents from each node. I expect this because I have at least 5 matching documents per shard. In fact, I have many more than 5 matching documents per shard. But I expect the limit to return five documents from each shard. Now I realize there are lots of real world circumstance that would cause the query to return fewer than 25 documents. Let's ignore those for the time being and remain under the assumption that the distribution is even. Now, if I place a limit of 1 on the query, I expect 5 documents back. Are these two expectations correct? Now let's assume a worst case scenario: all of the matching documents are on one shard. A limit of 5 should still return 5 documents. A limit of 1 should return 1 document. If these expectations are true, then my original scenario is valid and a limit of 1 should still return 1 document. So are these expectations valid? Or is limit completely non-deterministic? Size does work, but if I can improve performance with a limit, I would like to do so. It is possible that I have tens of thousands of matching documents, and limit could be an excellent short-circuit. Basically I want the shard to stop searching as soon as it has found one document. Also, I don't have the document _id so I cannot make the HEAD call. Do these clarifications help? On Wednesday, October 22, 2014 3:57:25 PM UTC-4, Jörg Prante wrote: limit is not a limit for response size. It sets a shard limit which is quite low level, so the resources per shard of ES are not so much under pressure. If the sum of the limits on the shards matches the total length of the response is not guaranteed. The limit parameter for the response is the size parameter. Can you try POST profiles/profile/_search { size : 1, query: { constant_score : { filter : { term: { profile_id: salinger-23145 } } } } } and see if this works better? If you want to perform a true existence check of a doc, you should use the doc _id and a head request, something like HEAD profiles/profile/id which is faster than a search. Jörg On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt jeff@gmail.com wrote: I have a query that I want to return only one document. Basically, I want to do an existence check on a document with a given term filter. I am executing: POST profiles/profile/_search { query: { filtered: { filter: { bool: { must: [ { limit: { value: 1 } }, { term: { profile_id: salinger-23145 } } ] } } } } } The profiles/profile mapping has tens of millions of documents in it, two of which match the given terms query (when the limit is removed entirely). When I execute the query, I get zero results back. However, If I change the limit value to two (2) then one (1) result is returned.
Re: CorruptIndexException when trying to replicate one shard of a new index
After disabling compression, I was able to successfully replicate that shard, so looks like we're hitting that bug. I guess we'll have to upgrade! Thanks! - Nate On Wednesday, October 22, 2014 5:26:42 PM UTC-4, Robert Muir wrote: Can you try the workaround mentioned here: http://www.elasticsearch.org/blog/elasticsearch-1-3-2-released/ and see if it works? If the compression issue is the problem, you can re-enable compression, just upgrade to at least 1.3.2 which has the fix. On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfol...@foursquare.com javascript: wrote: Created and populated a new index on a 1.3.1 cluster. Primary shards work fine. Updated the index to create several replicas, and three of the four shards replicated, but one shard fails to replicate on any node with the following error (abbreviated some of the hashes for readability): [2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME] [INDEXNAME][2] failed engine [corrupted preexisting index] [2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME] [INDEXNAME][2] failed to start shard org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))] at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343) at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, message [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) The index is stuck now in a state where the shards try to replicate on one set of nodes, hit this failure, and then switch to try to replicate on a different set of nodes. Have been looking around to see if anyone's encountered a similar issue but haven't found anything useful yet. Anybody know if this is recoverable or if I should just scrap it and try building a new one? - Nate -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/210c5bf5-c71a-4d5a-891d-3485a86dc0b4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Wildcards in exact phrase in query_string search
Dara, Realizing that this is an old post, but I am having this same issue. Was there a suggested solution that got you through Eric -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Wildcards-in-exact-phrase-in-query-string-search-tp4020826p4065258.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1414017635319-4065258.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Wildcard in an exact phrase query_string search with escaped quotes
Updating a post from 2012. I have a requirement to allow a wildcard within an exact phrase query_string. POST _search { query: { query_string: { query: \coors brew*\, analyze_wildcard: true } } } I get the following zero results set. { took: 94, timed_out: false, _shards: { total: 5, successful: 5, failed: 0 }, hits: { total: 0, max_score: null, hits: [] } } My expectation is to get variations of the exact match (below) looking through all fields in our document. - Coors Brewing - Coors Brewery - Coors Brews - etc - etc -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0198bd5d-62e4-4bde-8e81-eae6b465f777%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Custom in memory map/reduce using ES data
Hi, I have like billion records on 20 nodes and would like to run custom map/reduce or aggregation (word count,sentiment analysis,etc) immediately after the ES result set is determined. I came up with using Plugin system to customize aggregation like this: https://github.com/algolia/elasticsearch-cardinality-plugin/tree/1.0.X/src/main/java/org/alg/elasticsearch/search/aggregations/cardinality but want to update the jar quite often which will eventually require ES to be reload,I look up the scripted map/ reduce http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-scripted-metric-aggregation.html but was not sure about the memory usage or customization,I decide to run hazelcast or Spark on the same node or jvm and use their map/reduce framework.I use Filter phase to put the ES data like this: https://github.com/medcl/elasticsearch-filter-redis/blob/master/src/main/java/org/elasticsearch/index/query/RedisFilterParser.java#L121 but it just takes quite long time to put data on those in-memory middleware... Is there any best practice to put ES data to in-memory middleware, just to re-use the same data efficiently in subsequent program? I don't think I can use the ES query result set (on each shard) which seems to be on memory ,in my program,am I right? Thanks, Haji -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsobDAfy7%3DNXuD0%3DmH12H4haadiFYq25NCz47dfsOkDmmA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: CorruptIndexException when trying to replicate one shard of a new index
Thanks for closing the loop. On Wed, Oct 22, 2014 at 6:01 PM, Nate Folkert nfolk...@foursquare.com wrote: After disabling compression, I was able to successfully replicate that shard, so looks like we're hitting that bug. I guess we'll have to upgrade! Thanks! - Nate On Wednesday, October 22, 2014 5:26:42 PM UTC-4, Robert Muir wrote: Can you try the workaround mentioned here: http://www.elasticsearch.org/blog/elasticsearch-1-3-2-released/ and see if it works? If the compression issue is the problem, you can re-enable compression, just upgrade to at least 1.3.2 which has the fix. On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfol...@foursquare.com wrote: Created and populated a new index on a 1.3.1 cluster. Primary shards work fine. Updated the index to create several replicas, and three of the four shards replicated, but one shard fails to replicate on any node with the following error (abbreviated some of the hashes for readability): [2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME] [INDEXNAME][2] failed engine [corrupted preexisting index] [2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME] [INDEXNAME][2] failed to start shard org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))] at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343) at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, message [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME] [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs expected footer=-1071082520 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) The index is stuck now in a state where the shards try to replicate on one set of nodes, hit this failure, and then switch to try to replicate on a different set of nodes. Have been looking around to see if anyone's encountered a similar issue but haven't found anything useful yet. Anybody know if this is recoverable or if I should just scrap it and try building a new one? - Nate -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/210c5bf5-c71a-4d5a-891d-3485a86dc0b4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from
ElasticSearch deployment architecture with tribe nodes
Hi, I want to setup an ELK Stack infrastructure that streams the logs from two data centers and make the combined log viewable through a single Kibana console. Each data center has a local ElasticSearch cluster. So, I'm considering using Tribe nodes to bring the data together. The questions are - Because I want to setup the tribe nodes with HA and DR in mind, I'm considering putting two tribe nodes in each data center. Do you see any problem with this setup? Any special config I need to be aware of besides the one that's already been mentioned in tribe node blog? - Tribe node documentation mentions that multicast is enabled by default. Will there be any problem if unicast is used? - Thinking outside the box a bit more, besides the Tribe node usage, are there any recommended ES deployment architecture that satisfies my highly available and the single view of the data from two different data centers? Thanks, Connie -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a0e266b-718f-471d-b439-7beaeb02131a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.