Logstash not connecting elasticsearch?
I am running ELK setup on a single machine. Everything was working fine untill I switched off my internet connection. My logstash console shows this error when I switch off Internet connection: log4j, [2014-11-27T10:31:57.480] WARN: org.elasticsearch.transport.netty: [logstash-HP-Pro] exception caught on transport layer [[id: 0x7a124750]], closing connection java.net.SocketException: Network is unreachable at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:465) at sun.nio.ch.Net.connect(Net.java:457) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108) at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574) at org.elasticsearch.common.netty.channel.Channels.connect(Channels.java:634) at org.elasticsearch.common.netty.channel.AbstractChannel.connect(AbstractChannel.java:207) at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:229) at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182) at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:705) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:647) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:615) at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129) at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:338) at org.elasticsearch.discovery.zen.ZenDiscovery.access$500(ZenDiscovery.java:79) at org.elasticsearch.discovery.zen.ZenDiscovery$1.run(ZenDiscovery.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ***EDIT*** Logstash output config: output { elasticsearch { host = localhost } stdout { codec = rubydebug } } So it seams it's unable to connect to ES server. So is an internet connection be required always?? I am new to the setup. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/293a5ddf-cab0-4ef0-80d1-41f924528d48%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this
Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen this happen before but this time I decided to properly diagnose and seek some feedback on what might be going on. So, I'd love some feedback on what is going on. I'm happy to keep this cluster in a yellow state for a limited time to get some help from people in this group trying to diagnose this properly and maybe help some others who face the same issues. However, I will need to fix this one way or another before end of business day today. I plan to perform a rolling restart to see if node reinitialization fixes things. If not, I'll remove the problematic logstash index and move on. I'd love suggesttions for less intrusive solutions. I don't like losing data and rolling restarts are kind of tedious to babysit. Tends to take 45 minutes or so. Below is some information I've gathered. Let me know if you need me to extract more data. First the obvious: { status : 200, name : 192.168.1.13, cluster_name : linko_elasticsearch, version : { number : 1.4.0, build_hash : bc94bd81298f81c656893ab1d30a99356066, build_timestamp : 2014-11-05T14:26:12Z, build_snapshot : false, lucene_version : 4.10.2 }, tagline : You Know, for Search } [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty { cluster_name : linko_elasticsearch, status : yellow, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 221, active_shards : 619, relocating_shards : 0, initializing_shards : 2, unassigned_shards : 1 } So we're yellow and the reason is initializing and unassigned shards. We have five nodes, of which three are data nodes. It seems we are hitting some kind of resilience issue. The three machines have plenty of diskspace and memory. I found this in the log of one of our es nodes: [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message [RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io][inet[/192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested: RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution failed]; nested: RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]]; nested: NumberFormatException[For input string: finished]; ]] on the mentioned node there's a corresponding messages: [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform [indices:data/write/bulk[s]] on replica, message [RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]]; nested: NumberFormatException[For input string: finished]; ]] All three data nodes have similar messages happening over and over again. Our cluster has been up for a couple of weeks and seems pretty happy otherwise. I deleted some older logstash indices a few days ago. The cluster has logstash data and a few smallish indiceses we use for our inbot.io service. The issue appears to be related to the logstash index rollover. Our app servers and kibana talk to the two non data nodes that we run on both our application servers. My next stop was kibana which we use on the same cluster with the logstash index that is probably causing us issues. Looking at that, I noticed a few interesting things: - logstash indexing seems to be fine (good) and it appears there has been no data loss yet - our cpu load jumped around midnight and sort of stayed up on all three nodes. We measure this using collectd and both mean and max load jumped to around 1 around the time the index rollover happened. My next step was using curl -XGET 'localhost:9200/_cat/recovery?v' All the indices listed there looked fine. I'll spare you the output but everything appared to be in the 'done' stage. Finally, I did [linko@es3 elasticsearch]$ curl -XGET 'localhost:9200/_cluster/health/logstash-2014.11.27/?pretty' { cluster_name : linko_elasticsearch, status : yellow, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 5, active_shards : 12, relocating_shards : 0, initializing_shards : 2, unassigned_shards : 1 } So, that confirms last night's new logstash index is the issue and it
Kibana: To get all _types by using _index
I want to search on an index and get all index types under that index.I want to create a terms panel/table for it.It works at index_type level but not at index level. I'am unable to search by index as well.I used the filter- _index:name_of_index, it returned no results but _type:name_of_index_type works fine for searching on type. It returned the expected result. How can I achieve this using Kibana? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26d5fa7e-a442-4646-9fcb-c763e56242b8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Requests per second
Hello, I don´t really know how to measure the number of requests per second made to Elasticsearch cluster. I would like to know specifically the number of search requests per second. If I am not wrong, the stat query_total in search increases by the number of shards being accessed in every search, so if it increases in let´s say 25 units, that does not mean you had 25 _search requests but ES has used 25 shards in total to answer to a single (or several) _search request(s). So, is it possible at all to know the number of _search requests per second that your cluster receives? If not, is there any close approach to this? Thank you very much, Ernesto -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Requests per second
You have to count the requests per second in your client. Jörg On Thu, Nov 27, 2014 at 1:25 PM, Ernesto Reig erniru...@gmail.com wrote: Hello, I don´t really know how to measure the number of requests per second made to Elasticsearch cluster. I would like to know specifically the number of search requests per second. If I am not wrong, the stat query_total in search increases by the number of shards being accessed in every search, so if it increases in let´s say 25 units, that does not mean you had 25 _search requests but ES has used 25 shards in total to answer to a single (or several) _search request(s). So, is it possible at all to know the number of _search requests per second that your cluster receives? If not, is there any close approach to this? Thank you very much, Ernesto -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFJa1KHgdHmkazm%3DqZLG8HXzGMaWMg%2BcsBhRNtVn2ZTpQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Requests per second
Hi Ernesto, you may use a tool like JMeter to throw randomized queries and other operations at an Elasticsearch instance. JMeter will take care of counting requests per time and possibly also register other parameters you may use for a more detailed evaluation. At the end of the day, the total client count of queries that were processed in a given time frame is important. Don't rely on Elasticsearch-internal counters that may include other requests not visible to clients. The net performance should be measured end-to-end, possibly not even on the REST service or transport level, but at the respective front-end application (where one page view may trigger multiple Elasticsearch requests). Best regards, --Jürgen On 27.11.2014 13:25, Ernesto Reig wrote: Hello, I don´t really know how to measure the number of requests per second made to Elasticsearch cluster. I would like to know specifically the number of search requests per second. If I am not wrong, the stat query_total in search increases by the number of shards being accessed in every search, so if it increases in let´s say 25 units, that does not mean you had 25 _search requests but ES has used 25 shards in total to answer to a single (or several) _search request(s). So, is it possible at all to know the number of _search requests per second that your cluster receives? If not, is there any close approach to this? Thank you very much, Ernesto -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением *i.A. Jürgen Wagner* Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de http://www.devoteam.de/ Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54771E87.4080403%40devoteam.com. For more options, visit https://groups.google.com/d/optout. attachment: juergen_wagner.vcf
Re: Requests per second
Ok, so I guess the answer is No, there is no way to know it from ES point of view. If you want to know it you have to count them from the client. Thank you very much for the answers :) On Thursday, November 27, 2014 1:52:27 PM UTC+1, Jürgen Wagner (DVT) wrote: Hi Ernesto, you may use a tool like JMeter to throw randomized queries and other operations at an Elasticsearch instance. JMeter will take care of counting requests per time and possibly also register other parameters you may use for a more detailed evaluation. At the end of the day, the total client count of queries that were processed in a given time frame is important. Don't rely on Elasticsearch-internal counters that may include other requests not visible to clients. The net performance should be measured end-to-end, possibly not even on the REST service or transport level, but at the respective front-end application (where one page view may trigger multiple Elasticsearch requests). Best regards, --Jürgen On 27.11.2014 13:25, Ernesto Reig wrote: Hello, I don´t really know how to measure the number of requests per second made to Elasticsearch cluster. I would like to know specifically the number of search requests per second. If I am not wrong, the stat query_total in search increases by the number of shards being accessed in every search, so if it increases in let´s say 25 units, that does not mean you had 25 _search requests but ES has used 25 shards in total to answer to a single (or several) _search request(s). So, is it possible at all to know the number of _search requests per second that your cluster receives? If not, is there any close approach to this? Thank you very much, Ernesto -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением *i.A. Jürgen Wagner* Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen...@devoteam.com javascript:, URL: www.devoteam.de -- Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/38de9d8c-ad6b-4f63-ac82-59a004a10c20%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES security measures?
It is no difference to other distributed software. There are many facets of security. If you want authorized access, add a system which authenticates users and manages roles. Elasticsearch does not do this for you. If you want others to not read the Elasticsearch data traffic, set up a private network http://en.wikipedia.org/wiki/Private_network with your own gateway/router plus a reverse proxy for internet access. If you want to trust in your Elasticsearch cluster and keep others from tampering your data, then set up all the hardware and the network connection by yourself and lock others out from physical access to the facility. You can wait for the Elasticsearch security extension which has been announced. Jörg On Thu, Nov 27, 2014 at 6:39 AM, Siddharth Trikha siddharthtrik...@gmail.com wrote: I have set up my ELK stack on a single server and tested it on a very small setup to get a hands-down on ELK. I want to use ELK for my system logs analysis. Now, I have been reading about ES that it has no security. Also read something like this: DO NOT have ES publicly accessible. That's the equivalent of making your Wordpress MySQL database accessible to the world. ES is a REST accessible DB which means that anyone can delete all of your data with access to the endpoint. I am a noob in this. So this means if I put my logs in ES will they be accessible to everyone (which is scary) ?? Please guide me with what all security measures must be taken ?? Please suggest some links so that I can ensure security. How to keep my ES cluster private ?? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2D6qpQubeceKoz0N67RFTD9HWW%2BK4Dk1Q7b1%3DUgJzdw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Behavior of multi_field at index and query time
I hate to bump this post, but I would really appreciate if anyone has any input regarding this. Regards, Nils-Helge Garli Hegvik On Tuesday, November 25, 2014 4:20:49 PM UTC+1, nil...@gmail.com wrote: We have a mapping where one of the fields is an integer, but we want to change this to a double. We want to avoid re-indexing, since there will be a lot of documents at migration time. Hence, we were considering using a multi_field (now apparently deprecated, but I guess the same applies for the fields of a property) for this scenario, where the field is both treated as an integer and a double. This means that on the day of migration, all the old documents will only have the integer value set, and all new documents will have the double value set. In our code, we will only treat this value as a double. We have been doing some testing, and it seems like it should work, but I would like to confirm our findings as expected rather than by chance. Let's call the field the_field, and the double property double, like this: the_field: { type: integer, fields: { double: { type: double } } } - Without changing our indexing code, when writing a double value to the_field, the value is automatically written to the field as a double. When fetching the document back, the value of the_field is a double. New and old documents looks the same. Old documents have integer values and new documents have the double values for the_field. I would expect new documents to have the_field.double in the result instead, but this does not seem to be the case (which is good for us, if that is intended). - When querying the_field, say with a range query, both old and new documents appear in the result, but the double part of the value in the new documents are ignored. So 2.534 is treated as the value 2 in the range (or in sorting). This means that if the range is lte: 2, then even double values up to 3 are included in the range. - When querying the_field.double, say with a range query both old and new documents appear in the result, and the values from both old and new documents are treated as double values, as opposed to the previous example. So if the range is lte: 2, then only integer _and_ double values 2 are included in the range. Are these observations correct, and as expected? Or is it a side effect of some kind that we should not rely upon? And I assume the rules for queries also applies to aggregations? If this is in fact expected behavior, is it possible to alias the_field.double to the_field in queries, so it is by default treated as the double value? Regards, Nils-Helge Garli Hegvik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52160e2d-0c9b-4f81-8a1d-e0e2b9afd437%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to migrate lucene index into elasticsearch
Otis, I am not sure how many of our customers will accept to re-index the whole data as they are using it since long, although I am trying to convince my Senior Product Management to keep both Lucene and ES. Some old customers can think to migrate to ES if they need better real-time performance through distributed ES. Note :- Currently, the major reason to migrate to ES from Lucene is to have better distributed support for faster real-time search. I have an embedded Search Engine in our product which is based on Lucene 4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4 for better distributed support (sharding and replication, mainly). Thanks Gaurav On Sun, Nov 23, 2014 at 4:11 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: I can not tell if it will work, but if you could translate your xml mapping into an Elasticsearch mapping it would be great. The next steps would be to create an empty index with the mapping, using 1 shard and no replica, _source and _all disabled. Then you could index one test doc over the ES API. After this, you can find out in the data folder where ES created the segments files. By exchanging them with a copy of your Lucene segment files, they should get picked up - or you get nasty errors because ES uses a custom Lucene index format and can not process standard Lucene segments. Jörg On Thu, Nov 20, 2014 at 2:26 PM, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Thanks Jorg for the guidance and I have am trying the suggested approach #1 and I have further question on it. As you mentioned - *- a custom written tool could traverse the segments and extract field information and build a rudimentary mapping (without analyzer, without info about _all and _source and all Elasticsearch add-ons).* We already have a Lucene Index metadata (i.e. field names, type, analyzer etc.) available as an xml, so I can create the mapping without traversing the segments. Should I create segment file segments.gen using the mapping file and using some dummy values and then put all the other old lucene index files ( except segments.gen ) from existing lucene index files (e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.) *sample mapping xml file :-* Mapping indexField analyzedtrue/analyzed fieldanalyzerStandard/fieldanalyzer indexFieldNameAddressLine1/indexFieldName nameAddressLine1/name storedtrue/stored typestring/type /indexField indexField analyzedtrue/analyzed fieldanalyzerStandard/fieldanalyzer indexFieldNameBuilding_Name/indexFieldName nameBuilding_Name/name storedtrue/stored typestring/type /indexField indexField analyzedtrue/analyzed fieldanalyzerKeyword/fieldanalyzer indexFieldNameGNAF_PID/indexFieldName nameGNAF_PID/name storedtrue/stored typestring/type /indexField ... /Mapping Thanks On Thu, Nov 13, 2014 at 11:59 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: It is almost impossible to use just binary-only Lucene index for migration, because Elasticsearch needs additional info which is not available in Lucene. The only method is to reindex data over the Elasticsearch API. There is a bumpy road but I don't know if one ever tried that: - a custom written tool could traverse the segments and extract field information and build a rudimentary mapping (without analyzer, without info about _all and _source and all Elasticsearch add-ons) - another tool could try to reconstruct docs (like the tool Luke) and write them to a file in bulk format. Not having the source of the docs means it must be possible to retrieve the original input from the Lucene index (which is almost never the case) - the result could be re-indexed using the Elasticsearch API (assuming all analyzers and tokenizers are in place) but a lot of work would have to be done The preferred way is to rewrite the code that uses the Lucene API to use the Elasticsearch API and re-run the indexing process. Jörg On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Hi All, I have an embedded Search Engine in our product which is based on Lucene 4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4 for better distributed support (sharding and replication, mainly). Could you guide me how one should migrate the existing indexes created by Lucene to ES. I have referred to the mail thread - migrate lucene index into elasticsearch https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ. And based on the discussion in it appears to me that it's not a easy job or even not feasible. I am wondering if there is some plugin (river) or tool or any work around available to migrate the existing indexes created by Lucene to ES. I googled that an ES plugin available for SOLR to ES migration : http://blog.trifork.com/2013/01/29/migrating-apache-solr-to-elasticsearch/ . Do we have someting similar for Lucene to ES
ClusterBlockException after closing an index
Hi, I hope this is the correct group for asking about this behavior, sorry in advance if it isn't but I would greatly appreciate some help. I'm pretty new to ElasticSearch itself, currently trying to setup an ELK stack to analyze a couple of logs. Recently, I've tried to look around for a mechanism for data retiring and I started looking into the flush/close/delete options in ES. What I would like to do ultimately is create a cron job (maybe using Curator[1] for help) that would close indexes over a couple a days and delete them after a week. So as a test, I tried closing an index to see what would happen and, after I closed it, I noticed this error in the Kibana interface: Oops! ClusterBlockException[blocked by: [FORBIDDEN/4/index closed];] the same doesn't happen if I simply delete the index. I would like to continue seeing data from more recent indexes in Kibana, but simply close older ones. What am I doing wrong? Is there a way for me to keep some older (Logstash sent) data closed in case I need to open and query it later? Best regards, Bruno C. [1] https://github.com/elasticsearch/curator/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/609afd12-b257-4a06-b690-88a1f9ed027f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to migrate lucene index into elasticsearch
Thanks Jorg but I didn't able to migrate the lucene indexes to ES even after trying what you have suggested. Maybe, I need to follow some more steps. I am not getting any error but the search is not showing any docs/records. While comparing the files, I found that segments.gen are identical but the segments_N (segments_2 in lucene and segments_3 in ES) are slightly different [image: Inline image 1] Lucene Vs ES :- [image: Inline image 2] Thanks Gaurav On Thu, Nov 27, 2014 at 8:09 PM, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Otis, I am not sure how many of our customers will accept to re-index the whole data as they are using it since long, although I am trying to convince my Senior Product Management to keep both Lucene and ES. Some old customers can think to migrate to ES if they need better real-time performance through distributed ES. Note :- Currently, the major reason to migrate to ES from Lucene is to have better distributed support for faster real-time search. I have an embedded Search Engine in our product which is based on Lucene 4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4 for better distributed support (sharding and replication, mainly). Thanks Gaurav On Sun, Nov 23, 2014 at 4:11 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: I can not tell if it will work, but if you could translate your xml mapping into an Elasticsearch mapping it would be great. The next steps would be to create an empty index with the mapping, using 1 shard and no replica, _source and _all disabled. Then you could index one test doc over the ES API. After this, you can find out in the data folder where ES created the segments files. By exchanging them with a copy of your Lucene segment files, they should get picked up - or you get nasty errors because ES uses a custom Lucene index format and can not process standard Lucene segments. Jörg On Thu, Nov 20, 2014 at 2:26 PM, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Thanks Jorg for the guidance and I have am trying the suggested approach #1 and I have further question on it. As you mentioned - *- a custom written tool could traverse the segments and extract field information and build a rudimentary mapping (without analyzer, without info about _all and _source and all Elasticsearch add-ons).* We already have a Lucene Index metadata (i.e. field names, type, analyzer etc.) available as an xml, so I can create the mapping without traversing the segments. Should I create segment file segments.gen using the mapping file and using some dummy values and then put all the other old lucene index files ( except segments.gen ) from existing lucene index files (e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.) *sample mapping xml file :-* Mapping indexField analyzedtrue/analyzed fieldanalyzerStandard/fieldanalyzer indexFieldNameAddressLine1/indexFieldName nameAddressLine1/name storedtrue/stored typestring/type /indexField indexField analyzedtrue/analyzed fieldanalyzerStandard/fieldanalyzer indexFieldNameBuilding_Name/indexFieldName nameBuilding_Name/name storedtrue/stored typestring/type /indexField indexField analyzedtrue/analyzed fieldanalyzerKeyword/fieldanalyzer indexFieldNameGNAF_PID/indexFieldName nameGNAF_PID/name storedtrue/stored typestring/type /indexField ... /Mapping Thanks On Thu, Nov 13, 2014 at 11:59 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: It is almost impossible to use just binary-only Lucene index for migration, because Elasticsearch needs additional info which is not available in Lucene. The only method is to reindex data over the Elasticsearch API. There is a bumpy road but I don't know if one ever tried that: - a custom written tool could traverse the segments and extract field information and build a rudimentary mapping (without analyzer, without info about _all and _source and all Elasticsearch add-ons) - another tool could try to reconstruct docs (like the tool Luke) and write them to a file in bulk format. Not having the source of the docs means it must be possible to retrieve the original input from the Lucene index (which is almost never the case) - the result could be re-indexed using the Elasticsearch API (assuming all analyzers and tokenizers are in place) but a lot of work would have to be done The preferred way is to rewrite the code that uses the Lucene API to use the Elasticsearch API and re-run the indexing process. Jörg On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Hi All, I have an embedded Search Engine in our product which is based on Lucene 4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4 for better distributed support (sharding and replication, mainly). Could you guide me how one should migrate the existing indexes created by Lucene to ES. I have referred to the mail thread - migrate lucene index
Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this
This looks like a mapping issue to me (not 100% sure). A document that is in the translog has a string field (with value: 'finished'), but it is mapped as a number field (long, integer, double, etc.) in the mapping. This causes the number format exception that you're seeing in your logs when that document is indexed from the translog as part of the recovery and this then prevents the shard from getting started. These problems can occur when new fields are introduced at index time and also when numeric_detection is enabled in the mapping (which makes these errors more likely). Is this the case in your ES setup? Can you also check the mappings of the logstash-2014.11.27 index and see what fields can possible contain 'finished'? Unfortunately the field name didn't get included with your errors. On 27 November 2014 at 11:19, Jilles van Gurp jillesvang...@gmail.com wrote: Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen this happen before but this time I decided to properly diagnose and seek some feedback on what might be going on. So, I'd love some feedback on what is going on. I'm happy to keep this cluster in a yellow state for a limited time to get some help from people in this group trying to diagnose this properly and maybe help some others who face the same issues. However, I will need to fix this one way or another before end of business day today. I plan to perform a rolling restart to see if node reinitialization fixes things. If not, I'll remove the problematic logstash index and move on. I'd love suggesttions for less intrusive solutions. I don't like losing data and rolling restarts are kind of tedious to babysit. Tends to take 45 minutes or so. Below is some information I've gathered. Let me know if you need me to extract more data. First the obvious: { status : 200, name : 192.168.1.13, cluster_name : linko_elasticsearch, version : { number : 1.4.0, build_hash : bc94bd81298f81c656893ab1d30a99356066, build_timestamp : 2014-11-05T14:26:12Z, build_snapshot : false, lucene_version : 4.10.2 }, tagline : You Know, for Search } [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty { cluster_name : linko_elasticsearch, status : yellow, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 221, active_shards : 619, relocating_shards : 0, initializing_shards : 2, unassigned_shards : 1 } So we're yellow and the reason is initializing and unassigned shards. We have five nodes, of which three are data nodes. It seems we are hitting some kind of resilience issue. The three machines have plenty of diskspace and memory. I found this in the log of one of our es nodes: [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message [RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io][inet[/ 192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][ es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested: RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution failed]; nested: RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]]; nested: NumberFormatException[For input string: finished]; ]] on the mentioned node there's a corresponding messages: [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform [indices:data/write/bulk[s]] on replica, message [RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]]; nested: NumberFormatException[For input string: finished]; ]] All three data nodes have similar messages happening over and over again. Our cluster has been up for a couple of weeks and seems pretty happy otherwise. I deleted some older logstash indices a few days ago. The cluster has logstash data and a few smallish indiceses we use for our inbot.io service. The issue appears to be related to the logstash index rollover. Our app servers and kibana talk to the two non data nodes that we run on both our application servers. My next stop was kibana which we use on the same cluster with the logstash index that is probably causing us issues. Looking at that, I noticed a few interesting things: - logstash indexing seems to
Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this
Thanks for the explanation. I suspect many logstash users might be running into this one since you typically use a dynamic mapping with that. We have some idea where this is happening though and we can probably fix it properly. This happened during index roll over and we indeed are indexing a lot of things via logstash almost continuously. Jilles On Thursday, November 27, 2014 4:06:21 PM UTC+1, Martijn v Groningen wrote: This looks like a mapping issue to me (not 100% sure). A document that is in the translog has a string field (with value: 'finished'), but it is mapped as a number field (long, integer, double, etc.) in the mapping. This causes the number format exception that you're seeing in your logs when that document is indexed from the translog as part of the recovery and this then prevents the shard from getting started. These problems can occur when new fields are introduced at index time and also when numeric_detection is enabled in the mapping (which makes these errors more likely). Is this the case in your ES setup? Can you also check the mappings of the logstash-2014.11.27 index and see what fields can possible contain 'finished'? Unfortunately the field name didn't get included with your errors. On 27 November 2014 at 11:19, Jilles van Gurp jilles...@gmail.com javascript: wrote: Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen this happen before but this time I decided to properly diagnose and seek some feedback on what might be going on. So, I'd love some feedback on what is going on. I'm happy to keep this cluster in a yellow state for a limited time to get some help from people in this group trying to diagnose this properly and maybe help some others who face the same issues. However, I will need to fix this one way or another before end of business day today. I plan to perform a rolling restart to see if node reinitialization fixes things. If not, I'll remove the problematic logstash index and move on. I'd love suggesttions for less intrusive solutions. I don't like losing data and rolling restarts are kind of tedious to babysit. Tends to take 45 minutes or so. Below is some information I've gathered. Let me know if you need me to extract more data. First the obvious: { status : 200, name : 192.168.1.13, cluster_name : linko_elasticsearch, version : { number : 1.4.0, build_hash : bc94bd81298f81c656893ab1d30a99356066, build_timestamp : 2014-11-05T14:26:12Z, build_snapshot : false, lucene_version : 4.10.2 }, tagline : You Know, for Search } [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty { cluster_name : linko_elasticsearch, status : yellow, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 221, active_shards : 619, relocating_shards : 0, initializing_shards : 2, unassigned_shards : 1 } So we're yellow and the reason is initializing and unassigned shards. We have five nodes, of which three are data nodes. It seems we are hitting some kind of resilience issue. The three machines have plenty of diskspace and memory. I found this in the log of one of our es nodes: [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message [RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io ][inet[/192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][ es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested: RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution failed]; nested: RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]]; nested: NumberFormatException[For input string: finished]; ]] on the mentioned node there's a corresponding messages: [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform [indices:data/write/bulk[s]] on replica, message [RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]]; nested: NumberFormatException[For input string: finished]; ]] All three data nodes have similar messages happening over and over again. Our cluster has been up for a couple of weeks and seems pretty happy otherwise. I deleted some
Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this
If the field you suspect causing this is a string field in the mapping then you can try to close and open the index. This will then sync the in-memory representation of the mapping with what is in the cluster state. On 27 November 2014 at 16:49, Jilles van Gurp jillesvang...@gmail.com wrote: Thanks for the explanation. I suspect many logstash users might be running into this one since you typically use a dynamic mapping with that. We have some idea where this is happening though and we can probably fix it properly. This happened during index roll over and we indeed are indexing a lot of things via logstash almost continuously. Jilles On Thursday, November 27, 2014 4:06:21 PM UTC+1, Martijn v Groningen wrote: This looks like a mapping issue to me (not 100% sure). A document that is in the translog has a string field (with value: 'finished'), but it is mapped as a number field (long, integer, double, etc.) in the mapping. This causes the number format exception that you're seeing in your logs when that document is indexed from the translog as part of the recovery and this then prevents the shard from getting started. These problems can occur when new fields are introduced at index time and also when numeric_detection is enabled in the mapping (which makes these errors more likely). Is this the case in your ES setup? Can you also check the mappings of the logstash-2014.11.27 index and see what fields can possible contain 'finished'? Unfortunately the field name didn't get included with your errors. On 27 November 2014 at 11:19, Jilles van Gurp jilles...@gmail.com wrote: Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen this happen before but this time I decided to properly diagnose and seek some feedback on what might be going on. So, I'd love some feedback on what is going on. I'm happy to keep this cluster in a yellow state for a limited time to get some help from people in this group trying to diagnose this properly and maybe help some others who face the same issues. However, I will need to fix this one way or another before end of business day today. I plan to perform a rolling restart to see if node reinitialization fixes things. If not, I'll remove the problematic logstash index and move on. I'd love suggesttions for less intrusive solutions. I don't like losing data and rolling restarts are kind of tedious to babysit. Tends to take 45 minutes or so. Below is some information I've gathered. Let me know if you need me to extract more data. First the obvious: { status : 200, name : 192.168.1.13, cluster_name : linko_elasticsearch, version : { number : 1.4.0, build_hash : bc94bd81298f81c656893ab1d30a99356066, build_timestamp : 2014-11-05T14:26:12Z, build_snapshot : false, lucene_version : 4.10.2 }, tagline : You Know, for Search } [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty { cluster_name : linko_elasticsearch, status : yellow, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 221, active_shards : 619, relocating_shards : 0, initializing_shards : 2, unassigned_shards : 1 } So we're yellow and the reason is initializing and unassigned shards. We have five nodes, of which three are data nodes. It seems we are hitting some kind of resilience issue. The three machines have plenty of diskspace and memory. I found this in the log of one of our es nodes: [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message [RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io ][inet[/192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][ es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested: RemoteTransportException[[192.168.1.14][inet[/192.168.1.14: 9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution failed]; nested: RemoteTransportException[[192.168.1.13][inet[/ 192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]]; nested: NumberFormatException[For input string: finished]; ]] on the mentioned node there's a corresponding messages: [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform [indices:data/write/bulk[s]] on replica, message [RemoteTransportException[[192.168.1.13][inet[/192.168.1. 13:9300]][indices:data/write/bulk[s][r]]];
Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this
BTW. I should mention that I also filed a bug for this earlier today. https://github.com/elasticsearch/elasticsearch/issues/8684 Clinton Gormley kindly replied to that and provided some additional insight. It indeed seems our mapping is part of the problem but there's also the es side of things where it shouldn't get in this state. Apparently a fix for that part is coming. Best, Jilles On Thursday, November 27, 2014 11:19:20 AM UTC+1, Jilles van Gurp wrote: Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen this happen before but this time I decided to properly diagnose and seek some feedback on what might be going on. So, I'd love some feedback on what is going on. I'm happy to keep this cluster in a yellow state for a limited time to get some help from people in this group trying to diagnose this properly and maybe help some others who face the same issues. However, I will need to fix this one way or another before end of business day today. I plan to perform a rolling restart to see if node reinitialization fixes things. If not, I'll remove the problematic logstash index and move on. I'd love suggesttions for less intrusive solutions. I don't like losing data and rolling restarts are kind of tedious to babysit. Tends to take 45 minutes or so. Below is some information I've gathered. Let me know if you need me to extract more data. First the obvious: { status : 200, name : 192.168.1.13, cluster_name : linko_elasticsearch, version : { number : 1.4.0, build_hash : bc94bd81298f81c656893ab1d30a99356066, build_timestamp : 2014-11-05T14:26:12Z, build_snapshot : false, lucene_version : 4.10.2 }, tagline : You Know, for Search } [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty { cluster_name : linko_elasticsearch, status : yellow, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 221, active_shards : 619, relocating_shards : 0, initializing_shards : 2, unassigned_shards : 1 } So we're yellow and the reason is initializing and unassigned shards. We have five nodes, of which three are data nodes. It seems we are hitting some kind of resilience issue. The three machines have plenty of diskspace and memory. I found this in the log of one of our es nodes: [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message [RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io][inet[/ 192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][ es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested: RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution failed]; nested: RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]]; nested: NumberFormatException[For input string: finished]; ]] on the mentioned node there's a corresponding messages: [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14] [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform [indices:data/write/bulk[s]] on replica, message [RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]]; nested: NumberFormatException[For input string: finished]; ]] All three data nodes have similar messages happening over and over again. Our cluster has been up for a couple of weeks and seems pretty happy otherwise. I deleted some older logstash indices a few days ago. The cluster has logstash data and a few smallish indiceses we use for our inbot.io service. The issue appears to be related to the logstash index rollover. Our app servers and kibana talk to the two non data nodes that we run on both our application servers. My next stop was kibana which we use on the same cluster with the logstash index that is probably causing us issues. Looking at that, I noticed a few interesting things: - logstash indexing seems to be fine (good) and it appears there has been no data loss yet - our cpu load jumped around midnight and sort of stayed up on all three nodes. We measure this using collectd and both mean and max load jumped to around 1 around the time the index rollover happened. My next step was using curl -XGET 'localhost:9200/_cat/recovery?v' All the
Re: Is re-election/assignment of the master node possible?
The load issue affecting master detection / election shouldn't happen if you have dedicated masters... At least it is with 0.90.x ( with my limited knowledge of ES implementation details, there seems to be a lock or priority issue when serving large # of requests (http / thrift) , affecting cluster / metadata updates... I would think these metadata tasks ought to take priority in some cases over queries... ) On 26/11/2014 6:11 pm, Erik theRed j.e.redd...@gmail.com wrote: Thanks, Nik - There's no data on the node so it sounds like master reelection should fail over fairly quickly. On Wednesday, November 26, 2014 2:58:43 PM UTC-6, Nikolas Everett wrote: On Wed, Nov 26, 2014 at 3:47 PM, Erik theRed j.e.r...@gmail.com wrote: Is there any notion triggering a re-election of the master node? I'm currently running 1.2.4, and I have an instance that is scheduled for retirement (my favorite!) and it just so happens that it's my master node. What can I do to avoid the dreaded RED state? Is there some mechanism that can allow me to re-assign the current master to one of the other available two dedicated master nodes so I can reboot the current master? Move all the shards off of the node using allocation include/exclude settings. If you shoot the master one of the other master eligible nodes will take over quickly and there won't be any interruptions. I ask because I'm a bit gun-shy due to my experience when an elected master node has gone unresponsive (before I created dedicated masters) due to excessive HTTP connections, master re-election seemed to never occur and everything comes crumbling down. I've never had that problem. My cluster is pretty small though - only 31 nodes. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9506885-e321-4abe-b1c2-db0d802b07ec%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b9506885-e321-4abe-b1c2-db0d802b07ec%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4J8Ycj07jhHdX71JJjAHW0_ZALMH9mPUcW1__sF_1NagA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Hive write data to elastic search
HI David, did you find any fix for this issue, I am also facing the same problem. Thanks - Atul On Tuesday, July 8, 2014 8:37:09 AM UTC-7, David Zabner wrote: Hi all, I am trying to write data to elastic search from hive and whenever I try I get this error: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource ['es.resource'] (index/query/location) specified The script I am running looks like this: USE pl_10; ADD jar /home/hdfs/sql-tests/schema/elasticsearch-hadoop-2.0.0.jar; CREATE EXTERNAL TABLE IF NOT EXISTS REGIONES ( R_REGIONKEY INT, R_NAME STRING, R_COMMENTSTRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED by 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='radio/artists', 'es.nodes'='elastic-1'); INSERT OVERWRITE TABLE REGIONES select R_REGIONKEY, R_NAME, R_COMMENT from REGION; Any help would be much appreciated -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch deployment advise
1 - Depends on your use. 2 - Yes there are, see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#circuit-breaker On 28 November 2014 at 07:17, Denis J. Cirulis denis.ciru...@gmail.com wrote: Hello, I'm in need to plan a new deployment of elasticsearch. Single node, 128GB ram, for log indexing (amount 50 million records a day) 1. What's the best heap size for elasticsearch 1.4 (running Oracle java 7u72) ? 2. Is there some kind of query throttling technique to stop deep drill downs to prevent ES out of memory errors ? Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1c1c29e-9a25-4015-a7e6-6591e9e09118%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f1c1c29e-9a25-4015-a7e6-6591e9e09118%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZnDvMNb%2BAB5ZvMgTw5g-ePNgktQDVGjCYdhX9-FcQ7pLg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch deployment advise
20-30Gb per index a day. I've read in setup guide that heap more than 32Gb is useless, that's why I'm asking. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dfd0a2a4-505d-4496-8b49-2df00a65fa93%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch deployment advise
We have 128gb on some nodes and run 30gb heaps. Lucene memory maps files so the extra memory would be put to good use. The 32gb memory limit comes from the JVM compressing pointers. It can't compress after 32 and so you see everything expand in size. On Nov 27, 2014 4:18 PM, Denis J. Cirulis denis.ciru...@gmail.com wrote: 20-30Gb per index a day. I've read in setup guide that heap more than 32Gb is useless, that's why I'm asking. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dfd0a2a4-505d-4496-8b49-2df00a65fa93%40googlegroups.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0JUV%2BL8hi%3D9_PEML%3D4PdnO%2BfMY_MYXG9ANvPW6Qj0KLQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Hive write data to elastic search
Hi Atul, What does your Hive script looks like? What version of hive and es-hadoop are you using? Can you post them along with the stacktrace on Gist or Pastebin [1]? The exception message is pretty straight-forward - either 'es.resource' is missing or the resource type is incorrectly specified [2]. Cheers, [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/troubleshooting.html [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/hive.html#hive-configuration On 11/27/14 8:57 PM, Atul Paldhikar wrote: HI David, did you find any fix for this issue, I am also facing the same problem. Thanks - Atul On Tuesday, July 8, 2014 8:37:09 AM UTC-7, David Zabner wrote: Hi all, I am trying to write data to elastic search from hive and whenever I try I get this error: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource ['es.resource'] (index/query/location) specified The script I am running looks like this: USE pl_10; ADD jar /home/hdfs/sql-tests/schema/elasticsearch-hadoop-2.0.0.jar; CREATE EXTERNAL TABLE IF NOT EXISTS REGIONES ( R_REGIONKEY INT, R_NAME STRING, R_COMMENTSTRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED by'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='radio/artists', 'es.nodes'='elastic-1'); INSERT OVERWRITE TABLE REGIONES select R_REGIONKEY, R_NAME, R_COMMENT from REGION; Any help would be much appreciated -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5477ACC3.9050407%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Can't integrate Elasticsearch with Hive
Hi, The issue is most likely caused by two different versions of es-hadoop within your classpath, probably es-hadoop 2.0.x (2.0.2) and 2.1.x (2.1.0.Beta3). If they are picked up by Hive or Hadoop it means the JVM will have two jars with classes under the same package name. This leads to weird conflicts as classes from jar can interact with classes from the other jar, especially as between 2.0.x/2.1.x the code internally went through major changes. Make sure you have only one version of es-hadoop in your classpath - both on the client and in the cluster. That includes the Hive classpath, Hadoop classpath as well as the submitting jar (since the library might be embedded). P.S. IllegalAccesException indicates an illegal call - such as calling a non-public class in a different class. However in this case both classes are in the same package and HiveUtils class is not private... Cheers, On 11/27/14 9:19 AM, Atul Paldhikar wrote: Hi All, I am using Hive 0.13.1 and trying to create an external table so data can me loaded from Hive to Elasticsearch. However I keep getting the following error. I have tried with following jars but same error. I will really appreciate for any pointers. Thanks - Atul property namehive.aux.jars.path/name !-- value/apps/sas/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar/value -- value/apps/sas/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.Beta3.jar/value descriptionA comma separated list (with no spaces) of the jar files/description /property ERROR : 2014-11-26 23:09:22,069 ERROR [main]: exec.DDLTask (DDLTask.java:execute(478)) - java.lang.IllegalAccessError: tried to access class org.elasticsearch.hadoop.hive.HiveUtils from class org.elasticsearch.hadoop.hive.EsSerDe at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:81) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:281) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:631) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:593) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 2014-11-26 23:09:22,069 ERROR [main]: ql.Driver (SessionState.java:printError(545)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. tried to access class org.elasticsearch.hadoop.hive.HiveUtils from class org.elasticsearch.hadoop.hive.EsSerDe -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78b85fb6-6eea-46e8-964a-d96e324e780d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/78b85fb6-6eea-46e8-964a-d96e324e780d%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to
Re: ES security measures?
I have all my clusters behind the amazon's VPC segurity groups, but this week we're facing the need to let frontend clients (javascript) to access the ES indexes. There is an auth plugins (https://github.com/codelibs/elasticsearch-auth) which seems insteresting. It lets to limit the access to data limiting by user, pass, role, protocol and index (does not mention anything about types). I've not tested yet, but want to share because maybe is it useful for someone more. -- *Iván González Valiente* Systems programmer 2014-11-27 13:23 GMT+01:00 joergpra...@gmail.com joergpra...@gmail.com: It is no difference to other distributed software. There are many facets of security. If you want authorized access, add a system which authenticates users and manages roles. Elasticsearch does not do this for you. If you want others to not read the Elasticsearch data traffic, set up a private network http://en.wikipedia.org/wiki/Private_network with your own gateway/router plus a reverse proxy for internet access. If you want to trust in your Elasticsearch cluster and keep others from tampering your data, then set up all the hardware and the network connection by yourself and lock others out from physical access to the facility. You can wait for the Elasticsearch security extension which has been announced. Jörg On Thu, Nov 27, 2014 at 6:39 AM, Siddharth Trikha siddharthtrik...@gmail.com wrote: I have set up my ELK stack on a single server and tested it on a very small setup to get a hands-down on ELK. I want to use ELK for my system logs analysis. Now, I have been reading about ES that it has no security. Also read something like this: DO NOT have ES publicly accessible. That's the equivalent of making your Wordpress MySQL database accessible to the world. ES is a REST accessible DB which means that anyone can delete all of your data with access to the endpoint. I am a noob in this. So this means if I put my logs in ES will they be accessible to everyone (which is scary) ?? Please guide me with what all security measures must be taken ?? Please suggest some links so that I can ensure security. How to keep my ES cluster private ?? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2D6qpQubeceKoz0N67RFTD9HWW%2BK4Dk1Q7b1%3DUgJzdw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2D6qpQubeceKoz0N67RFTD9HWW%2BK4Dk1Q7b1%3DUgJzdw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BjeyjNnJQ3AJPzn6%3Dc8c%2Bd127r%2BeF4NTJM_Zy_DN%2BMW%3D7qW7A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Hive write data to elastic search
Hi Costin, actually I think I figured out the issue, my script had a typo (resources instead of resource) create external table ex_address (name String, st_no INT, st_name string, city string, state string, zip INT) stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler' tblproperties('es.resources' = 'employee/address'); However, for some reason I am back to square one with the original problem mentioned in another thread [Can't integrate Elasticsearch with Hive]. I have started getting the old exception again when accessing the external table, nothing really changed in the environment ! java.lang.IllegalAccessError: tried to access class org.elasticsearch.hadoop.hive.HiveUtils from class org.elasticsearch.hadoop.hive.EsSerDe Thanks - Atul On Thursday, November 27, 2014 2:59:49 PM UTC-8, Costin Leau wrote: Hi Atul, What does your Hive script looks like? What version of hive and es-hadoop are you using? Can you post them along with the stacktrace on Gist or Pastebin [1]? The exception message is pretty straight-forward - either 'es.resource' is missing or the resource type is incorrectly specified [2]. Cheers, [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/troubleshooting.html [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/hive.html#hive-configuration On 11/27/14 8:57 PM, Atul Paldhikar wrote: HI David, did you find any fix for this issue, I am also facing the same problem. Thanks - Atul On Tuesday, July 8, 2014 8:37:09 AM UTC-7, David Zabner wrote: Hi all, I am trying to write data to elastic search from hive and whenever I try I get this error: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource ['es.resource'] (index/query/location) specified The script I am running looks like this: USE pl_10; ADD jar /home/hdfs/sql-tests/schema/elasticsearch-hadoop-2.0.0.jar; CREATE EXTERNAL TABLE IF NOT EXISTS REGIONES ( R_REGIONKEY INT, R_NAME STRING, R_COMMENTSTRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED by'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='radio/artists', 'es.nodes'='elastic-1'); INSERT OVERWRITE TABLE REGIONES select R_REGIONKEY, R_NAME, R_COMMENT from REGION; Any help would be much appreciated -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript: mailto: elasticsearch+unsubscr...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41b72358-82ca-4b21-bd59-b5b8f7dc2adf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Can't integrate Elasticsearch with Hive
Hi Costin, Actually even that issue is resolved J There is spelling difference in the sample available on the web, all of them have the storage class as “EsStorageHandler” however only your GitHub post says it is “ESStorageHandler” which is right (https://gist.github.com/costin/8025827) ! The error should have been more accurate if I am using a wrong class name. Now the next problem, the MapReduce job is failing for some reason. I am still a beginner in Hadoop so not exactly sure where to debug. Here are some logs, looks like some bad character “#” in the job.xml file. But I that is generated by Hive right ? *Hive Log :* hive insert overwrite table ex_address select name, st_no, st_name, city, state, zip from employee.address; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1417158738771_0001, Tracking URL = http://finattr-comp-dev-01:8088/proxy/application_1417158738771_0001/ Kill Command = /apps/hadoop-2.5.1/bin/hadoop job -kill job_1417158738771_0001 Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0 2014-11-27 23:13:37,547 Stage-0 map = 0%, reduce = 0% Ended Job = job_1417158738771_0001 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec Container Job Logs * Stderr:- [sas@finattr-comp-dev-01 container_1417158738771_0001_02_01]$ cat stderr [Fatal Error] job.xml:606:51: Character reference # log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Syslog:- [sas@finattr-comp-dev-01 container_1417158738771_0001_02_01]$ cat syslog 2014-11-27 23:13:36,023 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1417158738771_0001_02 2014-11-27 23:13:36,334 FATAL [main] org.apache.hadoop.conf.Configuration: error parsing conf job.xml org.xml.sax.SAXParseException; systemId: file:///tmp/hadoop-sas/nm-local-dir/usercache/sas/appcache/application_1417158738771_0001/container_1417158738771_0001_02_01/job.xml file:///\\tmp\hadoop-sas\nm-local-dir\usercache\sas\appcache\application_1417158738771_0001\container_1417158738771_0001_02_01\job.xml; lineNumber: 606; columnNumber: 51; Character reference # at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2183) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2252) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2205) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078) at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:50) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) 2014-11-27 23:13:36,337 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:///tmp/hadoop-sas/nm-local-dir/usercache/sas/appcache/application_1417158738771_0001/container_1417158738771_0001_02_01/job.xml file:///\\tmp\hadoop-sas\nm-local-dir\usercache\sas\appcache\application_1417158738771_0001\container_1417158738771_0001_02_01\job.xml; lineNumber: 606; columnNumber: 51; Character reference # at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2205) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078) at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:50) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) Caused by: org.xml.sax.SAXParseException; systemId: file:///tmp/hadoop-sas/nm-local-dir/usercache/sas/appcache/application_1417158738771_0001/container_1417158738771_0001_02_01/job.xml
How do I get all the distinct field values that satisfy a particular query..
My sample document looks like this, { user: adfasdk, act: Made Purchase, productPrice: 5099 } What I want is all the unique users that satisfy a particular query. Say for example I have the query like below, that gets all the documents whose productPrice is between 5000 and 1, and whose length is between 50, 100. The returned documents will have repeated users. I just need all the unique users. Do I have calculate all the unique users from this result myself or is it possible to do using aggregation. { query: { bool: { must: [ { range: { productPrice: { gte: 5000, lte: 1 } } }, { range: { length: { gte: 50, lte: 100 } } } ] } }, _source: [ user ] } Below aggregation aggregates all my documents, { aggs: { unique_users: { terms: { field: user } } } } Instead I want to put my query from above into my aggregation of unique users.. Is it possible? Or do I have to just do my normal query and calculate the unique users myself from the query result? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7dd1fba8-a5fc-4234-9f95-88995ab9e1fd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.