Re: Indexing large number of files each with a huge size
Hi Jorg, This is mostly standard code that I am referring. This is called from multiple threads for a different set of files on disk. Please provide your suggestions. Thanks, BulkRequestBuilder bulkRequest = client.prepareBulk(); bulkRequest.setRefresh(false); for every input file in the input list, do ... MapString, Object jsonDocument = new HashMapString, Object(); jsonDocument.put(fileContent, STRING_CONTENT_OF_FILE); jsonDocument.put(fileProperty1, FILE_PROPERTY_1_STRING); jsonDocument.put(fileProperty1, FILE_PROPERTY_2_STRING); jsonDocument.put(fileProperty1, FILE_PROPERTY_3_STRING); jsonDocument.put(filePath, new BytesRef(filePath.toString())); bulkRequest.add(client.prepareIndex(indexName, typeName).setSource(jsonDocument)); } BulkResponse bulkResponse = bulkRequest.execute().actionGet(); Thanks, Sandeep On Mon, Aug 25, 2014 at 10:40 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Can you show the program how you index? Before tuning heap sizes or batch sizes, it is good to check if the program works correct. Jörg On Mon, Aug 25, 2014 at 7:00 PM, 'Sandeep Ramesh Khanzode' via elasticsearch elasticsearch@googlegroups.com wrote: Hi, I am trying to index documents, each file approx ~10-20 MB. I start seeing memory issues if I try to index them all in a multi-threaded environment from a single TransportClient on one machine to a single node cluster with 32GB ES server. It seems like the memory is an issue on the client as well as server side, and I probably understand and expect that :). I have tried tuning the heap sizes and batch sizes in Bulk APIs. However, am I trying to push the limits too much? One thought is to probably stream the data so that I do not hold it all in memory. Is it possible? Is this a general problem or just that my usage is wrong? Thanks, Sandeep -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2612109-b31c-4127-857b-f8aa27fb0aeb%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d2612109-b31c-4127-857b-f8aa27fb0aeb%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/QQDTzCAMQyU/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG7oByjnRhFoHboLJRRzhdBbsr%2BXC8NO0JU9KP0VEU4HQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG7oByjnRhFoHboLJRRzhdBbsr%2BXC8NO0JU9KP0VEU4HQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKnM90Z91arY3mtT3QGJJow49rRdR9zawuEmTABdVC5m_v%2B%2BuA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: aggregate on analyzed field
Hi, Multi-fields are usually the way to go in such cases, see http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html On Mon, Aug 25, 2014 at 9:49 PM, kti...@hotmail.com wrote: I am aggregating documents by customer name to find how many documents we have per customer. The aggregates bucketize words in names. For example, if I have customer, Tom Cruise, I would get 2 buckets, Tom and Cruise How would I treat the analyzed field as not_analyzed in aggregate query? I still want the field to remain analyzed so that I can do fulltext search. thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff43bdfa-db39-4894-8cf8-a1f6b0df96ce%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ff43bdfa-db39-4894-8cf8-a1f6b0df96ce%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5R69zwcKpYGr60WD_yQrwGcnfaRnfJusHUvmLimg2XnQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How to index Office files? *.txt and *.pdf are working...
I see what happened. Could you open an issue in mapper plugin? Will fix that next week. Thanks for the details! -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 25 août 2014 à 15:03, Dirk Bauer dirk.ba...@gmail.com a écrit : Hi David, thx for your help, but it's still not working. What I did: The query { query: { match: { _all: test } } } delivers all my indexed document (also the '.doc / *.docx files) and I can see the base64 stuff in the file.file field. So this looks good to me. Then I went to ..\config\logging.yml and added under the logger: section an entry for 1st attempt: org.apache.plugin.mapper.attachments: TRACE 2nd attempt: org.apache.tika: TRACE After shutdown of ES, restart, deleting the existing index and reindexing of my test documents there was no additional entry from the mapper plug or tika in the log. ES is logging fine... logger: # log action execution errors for easier debugging action: DEBUG # reduce the logging for aws, too much is logged under the default INFO com.amazonaws: WARN # gateway #gateway: DEBUG #index.gateway: DEBUG # peer shard recovery #indices.recovery: DEBUG # discovery #discovery: TRACE index.search.slowlog: TRACE, index_search_slow_log_file index.indexing.slowlog: TRACE, index_indexing_slow_log_file # DBA: Enabled logger for plugin mapper.attachments org.apache.plugin.mapper.attachments: TRACE The next idea was that maybe the mapping plugin is missing some files for parsing for Office documents? In the plug-in folder I can see *.jar files for rome-0.9.jar tagsoup-1.2.1.jar tika-core-1.5.jar tika-parsers-1.5.jar vorbis-java-core-0.1.jar vorbis-java-core-0.1-tests.jar vorbis-java-tika-0.1.jar xercesImpl-2.8.1.jar xml-apis-1.3.03.jar xmpcore-5.1.2.jar xz-1.2.jar apache-mime4j-core-0.7.2.jar apache-mime4j-dom-0.7.2.jar asm-debug-all-4.1.jar aspectjrt-1.6.11.jar bcmail-jdk15-1.45.jar bcprov-jdk15-1.45.jar boilerpipe-1.1.0.jar commons-compress-1.5.jar commons-logging-1.1.1.jar elasticsearch-mapper-attachments-2.3.1.jar fontbox-1.8.4.jar geronimo-stax-api_1.0_spec-1.0.1.jar isoparser-1.0-RC-1.jar jdom-1.0.jar jempbox-1.8.4.jar jhighlight-1.0.jar juniversalchardet-1.0.3.jar metadata-extractor-2.6.2.jar netcdf-4.2-min.jar pdfbox-1.8.4.jar Not sure but here you will find additional files poi*.jar that should be responsible to parse the office files: http://mvnrepository.com/artifact/org.apache.tika/tika-parsers/1.5 The following files were downloaded to the plugin folder but the documents are still not parsed... poi-3.10-beta2.jar poi-ooxml-3.10-beta2.jar poi-scratchpad-3.10-beta2.jar The last check was to make sure the word document are not corruped. A colleage of mine has checked a test file with java -jar tika-app-1.5.jar –g and the output was fine for the document. So, anyone some more ideas?? Thanks Dirk Am Montag, 25. August 2014 10:56:54 UTC+2 schrieb David Pilato: From my experience, this should work. Indexing Word docs should work as Tika support office docs. Not sure what you are doing wrong. Try to send a match all query and ask for field file.file. Also, you could set mapper plugin to TRACE mode in logging.yml and see if it tells something interesting. HTH -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 25 août 2014 à 09:05, Dirk Bauer dirk@gmail.com a écrit : Hi, using elasticsearch-1.3.2 with Plug-in - name: mapper-attachments version: 2.3.1 description: Adds the attachment type allowing to parse difference attachment formats jvm: true site: false on Windows 8 for evaluation purpose. JVM - version: 1.7.0_67 vm_name: Java HotSpot(TM) Client VM vm_version: 24.65-b04 vm_vendor: Oracle Corporation I have created the following mapping: { myIndex: { mappings: { dokument: { properties: { created: { type: date format: dateOptionalTime } description: { type: string } file: { type: attachment path: full fields: { file: { type: string store: true term_vector: with_positions_offsets } author: { type: string } title: { type: string } name: { type: string } date: { type: date format: dateOptionalTime } keywords: { type: string } content_type: { type: string } content_length: { type: integer } language: { type: string } } } id: { type: string } title: { type: string } } } } } } Because I like to use ES from C#/.NET I have created a little C# app that reads a file as base64 encodes stream from hard drive and put the document to the index of ES. I'm working with this POST request: { id: 8dbf1d73-44d1-4e20-aa35-13b18ddf5057, title: Test, description: Test Description, created: 2014-01-20T19:04:20.1019885+01:00, file: { _content_type: application/pdf, _name: Test.pdf,
Multi Tenant DB and JDBC River
Hi Jörg, I am working on a multi tenant application where each tenant has its own database. I am planning to use ES for indexing the data, and JDBC river for doing periodic bulk indexing. I do not want to create one river per DB per object type. This will lead to too many rivers. I wanted to modify the JDBC river so that I can give parent DB location, where all tenant db connection information is available. And then inside the river, modify it such that a feader thread is created for each river. Do you see any issue with this or do you have any other recommendation? Thanks, Nitin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Multi Tenant DB and JDBC River
For multi tenant, the river concept is awkward. River is a singleton and is bound to single user execution, and you are right, creating river instances per DB and per index does not scale. There are several options: - write a more sophisticated plugin which acts as a service and not as a singleton. The ES service component, which would maintain state in the cluster state, could accept job requests where each job request is equivalent to a JDBC pull. The job requests are delegated to a node which is not very busy with jobs (load balancing). The code of the JDBC river can be reused for that. - write a separate middleware for your tenants where they can have separate access to the DB and prepare ES JSON bulk files from (maybe be by REST API calls similar in style to ES). This would be a domain specific solution but offers most flexibility to the tenants, they are free to decide how and when to create and index the data from DB. Jörg On Tue, Aug 26, 2014 at 11:21 AM, Nitin Maheshwari ask4ni...@gmail.com wrote: Hi Jörg, I am working on a multi tenant application where each tenant has its own database. I am planning to use ES for indexing the data, and JDBC river for doing periodic bulk indexing. I do not want to create one river per DB per object type. This will lead to too many rivers. I wanted to modify the JDBC river so that I can give parent DB location, where all tenant db connection information is available. And then inside the river, modify it such that a feader thread is created for each river. Do you see any issue with this or do you have any other recommendation? Thanks, Nitin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEZumpR1oazTuVn6Ofad71jgEMkSBOKARizK9-gOFpVsA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Building an ERP with Elasticsearch. Am I crazy?
This is the generally accepted dogma and it has some merit. However, having two storage systems is more than a bit annoying. If you are aware of the limitations and caveats, elasticsearch is actually a perfectly good document store that happens to have a deeply integrated querying engine. This is useful since most solutions involving a secondary store involve solutions that have a much less capable querying engine and additional latency + architectural complexity related to pumping around data to elastic search. Elasticsearch crud operations are atomic. I.e. you can read your own writes across the cluster. If you use the version attribute during updates, you can detect version conflicts and prevent overwriting updates with stale data as well. This is a similar model that you would find in e.g. couchdb and similar document stores. There are not that many sharded and replicated, horizontally scalable document stores out there and even fewer with decent querying ability. The caveat is that elasticsearch is not as battle tested as other solutions in this space and that various people have shown that ways exist to cause an elastic search cluster to lose data, to corrupt data, etc. So, you need to be prepared to be able to recover from such situations. That means you need backups (e.g. use the snapshots feature) and a plan for when things go bad. The flip side is that other solutions have issues as well. Postgresql clustering is brand new and probably has issues and if you use it in non clustered mode, the failure scenarios get even more interesting. I use Mariadb Galera cluster and it sucks big time and it needs a lot of handholding during upgrades. Couchdb doesn't shard and shares server failure scenarios with elasticsearch. Mongodb and cassandra each have had their share of issues related to data corruption and data loss in the recent past and both have recently fixed major issues related to that. So, there are lots of solutions out there and none of them are perfect. Elasticsearch has several major areas where it needs improvement (and which are indeed being worked on in recent versions): 1) it has many ways it can run out of memory. If you skim through the release notes of recent versions, you'll see a lot of fixes related to that including the use of e.g. circuit breakers. The problem with OOM's is that it can cause a cascading cluster failure where one node becomes slow, eventually drops out of the cluster and then other nodes start having the same issues. I've personally seen Kibana kill our cluster on two occasions. In both cases the logs of all nodes were full of OOM's and the cluster died while simply clicking through different dashboards in Kibana. This has not happened with the current 1.3.x version (yet) but that doesn't mean it is impossible. 2) split brain situations when a quorum is lost but not detected are fairly easy to trigger. Every time I do a rolling update, the cluster takes several seconds to catch up with fact that I'm shutting down nodes. I have a three node cluster. One node goes down, means my cluster should be yellow. Two nodes down means red and it should no longer accept writes. The problem is that during those few seconds, the cluster status may not reflect reality and nodes may in fact be accepting writes when they shouldn't. 3) A full cluster restart needs a lot of handholding. The reason for this is that most of the failure scenarios relate to there not being a quorum and detecting that. For example, if you simply restart the nodes one by one quickly you will easily get your cluster in a red state where it should no longer be accepting writes. The problem as described above is that detecting this relies on timeouts and there may be some nodes that continue to write for a few seconds after they should have stopped doing that. By the time your cluster goes red, it's too late and you are going to have to manually decide which shards you want to loose. That's why you need to keep an eye on cluster status during rolling updates. Imagine somebody power cycling your elastic search node cluster or worse, rebooting the switch that connects your nodes. 4) Elasticsearch under load may throw 503s occasionally. I've seen this happen on our test infrastructure a couple of times and it worries me. This is not something you want to see when you are writing customer data. Mitigation for these issues typically involves using specialized nodes for read and write traffic and cluster management. Additionally, you need to heavily tweak things to make certain failure scenarios less likely. Out of the box, there is a lot of stuff that can go wrong. We're actually deprecating our mariadb architecture and switching to an elasticsearch only architecture. I'm well aware that I'm taking a risk here and I have a backup plan for most of those risks. This includes changing plans and switching to couchdb or a similar document store
AW: Shards
Hi, I´ve found the problem, the JSON-structure was not correct, it has to be this if you are using the JAVA-API: { analysis:{ ... }, index:{ number_of_replicas:1, number_of_shards:3 } } Thanks Markus ;) Von: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] Im Auftrag von Markus Wiesenbacher Gesendet: Montag, 25. August 2014 23:55 An: elasticsearch@googlegroups.com Betreff: Shards Hi folks, I am using a single Node-Cluster (v1.3.2) on my PC, and I was wondering that there are always 5 shards in the file-system (separate Lucene-indices), no matter how many I configure in in elasticsearch.yml or programmatically with Java-API (loadFromSource with JSON-String). Do I missunderstand something? Many thanks! Markus ;) BTW: Here´s my JSON for the settings: { analysis:{ ... }, settings:{ index:{ number_of_replicas:1, number_of_shards:3 } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/009901cfc0af%2448b7ab40%24da 2701c0%24%40codefreun.de https://groups.google.com/d/msgid/elasticsearch/009901cfc0af%2448b7ab40%24d a2701c0%24%40codefreun.de?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/001a01cfc11e%249c6973d0%24d53c5b70%24%40codefreun.de. For more options, visit https://groups.google.com/d/optout.
Get distinct result by using multi_match and suggestion
Is there a way to solve the following problem? I have created a search field with suggestions functionality. The user is able to search for names, categories, etc. These fields are mapped like: { settings: { analysis: { analyzer: { *autocomplete*: { type: custom, tokenizer: *edge_ngram_tokenizer*, filter: [ lowercase ] } }, tokenizer: { *edge_ngram_tokenizer*: { type: edgeNGram, min_gram: 1, max_gram: 20, token_chars: [letter, digit] } } } }, mappings: { my_type: { dynamic: strict, properties: { id: { type: long }, *name*: { type: string, analyzer: english, fields: { *autocomplete: {* *type: string, * *index_analyzer: autocomplete, * *search_analyzer: standard* * }* } }, *category*: { type: string, analyzer: english, fields: { *autocomplete: {* *type: string, * *index_analyzer: autocomplete, * *search_analyzer: standard* * }* } }, ... Now when i do something like this: curl -XGET http://localhost:9200/my_index/my_type/_search; -d' { _source: false, query: { multi_match: { query: pet, fields: [ *.*autocomplete* ] } } }' I get results like these: - Peter - Peter - Peter - Petra - Petra etc. *How can I reduce (distinct) the results on server side like these?* - Peter - Petra - etc. thx, Ramy -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/859b067e-f2cb-4cc4-beef-bba547a85906%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Building an ERP with Elasticsearch. Am I crazy?
Mohit Anchlia, How do you sync ES with your main DB? That's what I'm thinking for my project because I don't have much experience with ES. Thanks On Aug 26, 2014 1:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: In general use elasticsearch only as a secondary index. Have a copy of data somewhere else which is more reliable. Elasticsearch often runs into index corruption issues which are hard to resolve. On Mon, Aug 25, 2014 at 9:30 PM, xiehai...@gmail.com wrote: On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote: Hi, First I would like to thanks all of you for Elastic. I am thinking in use it in a ERP that I am building. What do you think about this? Am I crazy? Has someone face this? I really don't think that I am comfy enough to do this, change the problems that I already know, for new problems that I really don't know how to deal. I believe that nosql will prevail over traditional sql, but I don't know if I am ready to this task. So how you think that I should integrate (or not) postgresql with ELASTICSEARCH? Will you plan t use ES to index data in postgresql? I have similar idea, want to use ES instead datawarehouse. Some problems I can see: 1) Data in RDBMS are stored in tables, connected with relationship. You can use very complex sql to query a complex result, how to do in ES? 2) If your want to run some analyse algorithms with exist data, how to running in ES? 3) if your data are enough big, search one keyword in '_all' field, ES will be slow? Thanks. -Terrs Thanks again, rsw1981 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yHVPWNXxgys/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHMXrw5BH-OA2BqWmUWOt2HyB-3tZEiw3cwJ%3D1U9aaucMTk-Tg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: AutoCompletion Suggester - Duplicate record in suggestion return
Hi Alexander, If I may, I have a follow-up question to your response here. How does the completion suggester behave with fields such as payload and score when it is unifying the response based on output ?? Are scores increased based on this combination? if payloads are different, which ones are returned? Thanks for you help! Alistair On Monday, April 21, 2014 2:26:13 PM UTC+2, Alexander Reelsen wrote: Hey, the output is used to unify the search results, otherwise the input is used. The payload itself is just meta information. The main reason, why you see the suggestion twice is, that even though a document is deleted and cannot be found anymore, the suggest data structures are only cleaned up during merges/optimizations. Running optimize should fix this. Makes sense? --Alex On Sun, Apr 13, 2014 at 12:49 PM, kidkid zki...@gmail.com javascript: wrote: I have figure out the problem. The main problem is I have used the same output for all input then ES have been wrong in this case. I still trying to improve the performance. I am just test on 64Gb Ram server (32Gb for ES 1.0.1) 24 core. Have only 2 record but it took me 3ms to suggest. On Sunday, April 13, 2014 4:53:21 PM UTC+7, kidkid wrote: There are something really strange. I don't know whether anyone have worked with this such feature or it's just not-stable feature. If we do index same input, and different output,payload, then only one result found. Do anyone tell me how could I fix it ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f6547a58-c002-4ff3-80c9-2052e1d14ddd%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f6547a58-c002-4ff3-80c9-2052e1d14ddd%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/13c35309-a55b-45d7-ba37-bd7bb44e6f20%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Timezone in Simple Query
All dates are UTC. Internally, a date maps to a number type long. When applied on date fields the range filter accepts also a time_zone parameter { range : { born : { gte: 2012-01-01, time_zone: +1:00 } } } but this is not possible { match : { post_date : 2012-01-01, *time_zone: +1:00* this do not work } } How can i do to permit any users to query correct respect his own timezone...appending it to query?? Tnx -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39414b48-f2c5-4faa-b103-96b91c0888b2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Building an ERP with Elasticsearch. Am I crazy?
I am reading a lot studying what is the best aproach fo this. My main question can be resumed in two points If I choose ES to index my postgresql. What's the best way to do that? I need cluster? The most problems that I read about was related to that. If this is true and I can run in one node should I do that? Thanks for share your experience. Have a nice day Raphael Waldmann -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHMXrw52OShKM0snMxtHy-rSPEvscNQeoUurbR8uqp_x0%2BPZtA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Aggregation across indices
Hi, If I have two indices each having part of the record and joined using some common identifier, can I issue a query across both indices and have aggregations apply taking into consideration both indices? Example: Index 1: Type 1: ID: String Field1: String Field2: String Index 2: Type 2: ID: String (From above. I can keep this same to behave like a foreign key.) Field3: String Field4: String Can I effect a join across both indices and aggregate on Field4 for example? Please let me know. Thanks, Sandeep -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Ability to search accross 'types' in the same index, with different search parameters yet applying the same size and from values, in a single search query
Hello AJ , You can do this as follows { query_string : { query : test-type1.status:1 || test-type2.status:2 } But then there is a bug associated with a corner condition of this - https://github.com/elasticsearch/elasticsearch/issues/4081 So be a bit careful. Thanks Vineeth Thanks Vineeth } On Tue, Aug 26, 2014 at 1:21 PM, Ajinkya Apte ajin...@gmail.com wrote: Hello, Examples of some documents: POST /test-index/test-type-1/doc-1 { text : Some text, status : 1 } POST /test-index/test-type-2/doc-1 { text : Some new text, status : 1 } POST /test-index/test-type-2/doc-2 { text : Some even new text, status : 2 } Is there a single query I can use so that I can get all the documents that are of 'status'=1 in 'type'='test-type-1' and 'status'=2 in 'type'='test-type-2' applying the same 'size' and 'from' params? As of right now I am running two different queries and then I am trying to merge them programatically. Any better way you recommend? AJ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f6bac9f-8fc3-4886-8a70-4dac81424073%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7f6bac9f-8fc3-4886-8a70-4dac81424073%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nkrc8beGyi-tLgox4Ec25H%2BUftBTy1-Q7xbwXOP4RMVg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation across indices
Hello Sandeep , What you are intending is not possible. But then Elasticsearch do have some good relational operations which needs to be defined before indexing. If you can elaborate your use case , we can help on this. Thanks Vineeth On Tue, Aug 26, 2014 at 6:04 PM, 'Sandeep Ramesh Khanzode' via elasticsearch elasticsearch@googlegroups.com wrote: Hi, If I have two indices each having part of the record and joined using some common identifier, can I issue a query across both indices and have aggregations apply taking into consideration both indices? Example: Index 1: Type 1: ID: String Field1: String Field2: String Index 2: Type 2: ID: String (From above. I can keep this same to behave like a foreign key.) Field3: String Field4: String Can I effect a join across both indices and aggregate on Field4 for example? Please let me know. Thanks, Sandeep -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D93B%2Bk1QCKQHg_n%3D%3Da9Yih9Lyi1k4Gt_LZ7kywnBiroQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Failed start of 2nd instance on same host with mlockall=true
Hi all, In an attempt to squeeze more power out of our physical servers we want to run multiple ES jvm's per server. Some specs: - servers has 24 cores, 256GB ram - each instance binds on different (alias) ip - each instance has 32GB heap - both instances run under user 'elastic' - limits for 'elastic' user: memlock=unlimited - es config for both instances: bootstrap.mlockall=true The 1st instance has been running for weeks. When starting the 2nd instance the following things happen: - increase of overal cpu load - lots of I/O to disks - no logging for 2nd instance - 2nd instance hangs - 1st instance keeps running, but gets slowish - cd /proc/pid causes a hang of cd process (until 2nd instance is killed) - exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed) Maybe (un)related: I have never been able to run Elasticsearch in a virtualbox with memlock=unlimited and mlockall=true. After an hour of trial errors I found that removing setting 'bootstrap.mlockall' (setting it to false) from 2nd instance's configuration fixes things. I am confused, but acknowledge I do not know anything about memlocking. Any ideas? Regards, Renzo -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need some advice to build a log central.
Hello Sang , Can i know why you are using Hive. I feel you can do the analysis in Elasticsearch itself. Rest seems good to me. Thanks Vineeth On Tue, Aug 26, 2014 at 8:03 AM, Sang Dang zkid...@gmail.com wrote: Hello All, I have selected #2 as my solution. I write data to ES, and use kibana+ to realtime monitor. For stats, I use Hive. Each project, I will create a index, for each type of log I will put in a ES Type, ex: ProjectXlog_debug log_error Stats_API Stats_PageView Stats_XYZ I am wonder whether it's good ? Should I separate by time for each type of project ? Regards. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35487688-4204-4f4d-aa2e-2a9b6a43aa82%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/35487688-4204-4f4d-aa2e-2a9b6a43aa82%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mvMAK%3DkKqg%3DTyzb-J0Boo_CVPUnC_vY0j%2BhNn_rH8_%3Dw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is it possible to register a RestFilter without creating a plugin?
Hello Jinyuan , I dont feel this is possible. In such a provision , how will you define what the REST API will do ? Thanks Vineeth On Tue, Aug 26, 2014 at 2:41 AM, Jinyuan Zhou zhou.jiny...@gmail.com wrote: Thanks, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEkN%3DsgLqmkT-vwQ%2BptCs0LmPa0BDw5hX3A5Yzg-Wx_A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
define multiple types in an index
Hello, I am using elasticsearch 1.3.2 and try to understand elasticsearch (with my Oracle background ;-)). For testing I use the data available on http://fec.gov/disclosurep/PDownload.do There is a datafile for every state of the USA. I don't know whether it is a good idea but I want to make 1 index with a type for every state. I want to define the fields and their types in advance. Can I create the index with type AL and add other types after creation? I tried but I was not able to do it. I created the the folloing index curl -XPOST localhost:9200/contributions -d '{ settings : { number_of_shards : 10, number_of_replicas : 1, _index : true }, mappings : { AK : {properties : {cand_id : {type : string , index : not_analyzed }, cand_nm : {type : string }, cmte_id : {type : string } } } , AL : { properties : { cand_id : {type : string , index : not_analyzed }, cand_nm : { type : string }, cmte_id : {type : string } } } } }' Can I add types for AR and AZ after creation? They have the same column definition. Is there a better way to achieve this? Regards HansP -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/191875fc-8378-4ed1-8ca9-b0f2fbfc2ccc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Swap indexes?
I was looking for the index alias, thanks all. On Tuesday, June 17, 2014 9:31:00 AM UTC+1, Lee Gee wrote: Is it possible to have one ES instance create an index and then have a second instance use that created index, without downtime? tia lee -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c577a018-fe46-4b73-a08c-ea07796fa02d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _suggest suggestion/question
Thank you, Vineeth. On Sunday, August 17, 2014 12:04:20 PM UTC+1, vineeth mohan wrote: Hello Lee , You will need to use context suggester for this purpose - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html Also this difference stems from the fact that , both actual data and auto completion data are stored in different data structures. This is to make sure that the auto completion data is memory resident and thus super fast. Thanks Vineeth On Sun, Aug 17, 2014 at 3:32 PM, Lee Gee lee...@gmail.com javascript: wrote: My reading, which may not be accurate, of this [1] clear and concise post, is that it is not possible to use a reference to an existing field as an argument to a suggestor's 'input' or 'payload' fields. Please would you clarify if I have missed something? If I was correct, would it be much work to add these features? TIA Lee [1] http://www.elasticsearch.org/blog/you-complete-me/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9ea51925-5ef8-48f3-8960-e5462e112713%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to get the field infomation when _all and _source was set disabled
Hello Wang , By default the _source field stores the input JSON and gives it back for each document match. If you disable , ES wont be able to return it. Hence the result you see. By default ES wont make any efforts to tap the Stored information , it rather takes the json stored in _source field. Now to get the text set as stored , you need to use the fields option. Typically , you need to tell ES , you need so and so fields. This information would be searched in stored field space rather than _source field. In your query , you need to mention the fields you are interested in - searchRequestBuilder.setTypes(type1).fields([ title ] ) ( equal-ant in Java) Thanks Vineeth On Mon, Aug 25, 2014 at 1:09 PM, Wang Mingxing wmx...@gmail.com wrote: Hi, I created an index, which was named test_all, and it has a table : type1. I want to test the usage of _all and _source. Now , I change their status to false. The mapping as follows: $ curl -XGET 'localhost:9200/test_all/_mapping/type1?pretty' { test_all : { mappings : { type1 : { _all : { enabled : false }, _source : { enabled : false }, properties : { content : { type : string, analyzer : ik }, title : { type : string, store : true, analyzer : ik } } } } } } In the table type1, I store the title information. I insert five document in type1. But, when retrievaling them, I could not find the field title information. $ curl -XGET 'localhost:9200/test_all/type1/_search?pretty' { took : 16, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 5, max_score : 1.0, hits : [ { _index : test_all, _type : type1, _id : zWQno3rLS56hkwJ_Y108Dg, _score : 1.0 }, { _index : test_all, _type : type1, _id : BDKa-IP7TDK_iM2VNGFPYw, _score : 1.0 }, { _index : test_all, _type : type1, _id : n97suWSwQACgx35APTOqPg, _score : 1.0 }, { _index : test_all, _type : type1, _id : 2P7OblUiQB2Y8ZCtWWWTdg, _score : 1.0 }, { _index : test_all, _type : type1, _id : Lo_PFVeKTEWazwCLbyKAqQ, _score : 1.0 } ] } } Then, I try to resolve it by JAVA API: public static void indexSearch(Client client){ SearchRequestBuilder searchRequestBuilder=client.prepareSearch(test_all); searchRequestBuilder.setTypes(type1); SearchResponse searchResponse=searchRequestBuilder.execute().actionGet(); SearchHit[]hits=searchResponse.getHits().getHits(); System.out.println(count: +hits.length); for(SearchHit hit:hits){ System.out.println(); System.out.println(docID: +hit.getId()); System.out.println(score: +hit.getScore()); System.out.println(title: +hit.getFields().get(title).toString()); } } and it shows: Exception in thread main count: 5 java.lang.NullPointerException at es.api.Test_All.indexSearch(Test_All.java:64) at es.api.Test_All.main(Test_All.java:73) docID: zWQno3rLS56hkwJ_Y108Dg score: 1.0 I guess the value doesn't exist. Can you call me why? Many Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53FAE834.109%40gmail.com https://groups.google.com/d/msgid/elasticsearch/53FAE834.109%40gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mD1J2Jx%2B5_%2BppYsd9Uc%3DkAtS%2B9fjTWbJjOYXmAorYP5w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
gateway.recover_after_nodes minimum_master_nodes in a distributed environment?
Hello all, Question about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in a distributed ES cluster. By distributed I mean I have: 2 nodes that are data only: 'node.data' = 'true', 'node.master' = 'false', 'http.enabled' = 'false', 1 node that is a master/search only node: 'node.master' = 'true', 'node.data' = 'false', 'http.enabled' = 'true', When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1 formula including *all* nodes of all types in the cluster, or just those who can be masters? Similarly, when setting gateway.recover_after_nodes, is this value the number of all nodes of all types in the cluster, or just those that are data nodes? Thank you very much for your time! Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Using elasticsearch as a realtime fire hose
You might want to look at developing a plugin for this or maybe using an existing one. This one for example might do partly what you need: https://github.com/derryx/elasticsearch-changes-plugin If you develop your own plugin, you should be able to tap into what is happening in the cluster at a pretty low level. Jilles On Monday, August 25, 2014 9:27:42 AM UTC+2, Jim Alateras wrote: What kind of events do you think of? Single new document indexed? Batch of docs indexed? Node-wide? Or cluster wide? event on whenever a document is added to an index cluster wide You mention Redis, for something like publish/subscribe pattern, you'd have to use a persistent connection and implement your own ES actions, which is possible with e.g. HTTP websockets A sketchy implementation can be found here: https://github.com/jprante/elasticsearch-transport-websocket thanks for the reference, I will have a deeper look at it. Jörg On Sat, Aug 23, 2014 at 8:09 PM, Jim Alateras j...@sutoiku.com wrote: I was wondering whether there were any mechanisms to use ES as a realtime feed for downstream systems. I have a cluster that gathers observations from many sensors. I have a need to maintain a list of realtime counters in REDIS so I want to further process these observation once they hit the database. Additionally I also want to be able to create event streams for different type of feeds. I could do all this outside ES but I was wondering whether there were mechanisms within ES that will allow me to subscribe to add events for a particular type or index. cheers /jima -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f5b1d11-0be1-461d-a5bd-dd70f1a0b6c1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/9f5b1d11-0be1-461d-a5bd-dd70f1a0b6c1%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3468dc3-2b96-4f00-a921-fba6892b5bba%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Logstash stop communicating with Elasticsearch
I had some issues with logstash as well and ended up modifying the elasticsearch_http plugin to tell me what was going on. Turned out my cluster was red because my index template required more replicas than was possible:-). The problem was that logstash does not fail very gracefully and logging is not that great either (which I find ironic for a logging centric product). So I modified it to simply log the actual elastic search response, which was a 503 unavailable. From there it was pretty clear what to fix. I filed a bug + pull request for this but it seems nobody has done anything with it so far: https://github.com/elasticsearch/logstash/issues/1367 Jilles On Saturday, August 23, 2014 2:51:18 PM UTC+2, 凌波清风 wrote: Hello, I also happen that you encounter this problem, the situation happened to me is that this error occurs in the morning every day. You do not know how to solve, hoping to give some help. thx. 在 2014年7月18日星期五UTC+8下午8时56分54秒,Alexandre Fricker写道: Everithing was working fine until 4 h this morning when Logstash stop send new logs to Elasticsearch and when I stop then restart the losgstash process it reprocess a bulk of new log lines and when it start to send it to Elasticserch it start writing this message again and again {:timestamp=2014-07-18T09:46:29.593000+0200, :message=Failed to flush outgoing items, :outgoing_count=86, :exception=#RuntimeError: Non-OK response code from Elasticsearch: 404, :backtrace=[/soft/sth/lib/logstash/outputs/elasticsearch/protocol.rb:127:in `bulk_ftw', /soft/sth/lib/logstash/outputs/elasticsearch/protocol.rb:80:in `bulk', /soft/sth/lib/logstash/outputs/elasticsearch.rb:321:in `flush', /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in `buffer_flush', org/jruby/RubyHash.java:1339:in `each', /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in `buffer_flush', /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:193:in `buffer_flush', /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:112:in `buffer_initialize', org/jruby/RubyKernel.java:1521:in `loop', /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:110:in `buffer_initialize'], :level=:warn} But when I check Elastisearch status in Elastisearch HQ everything is Green and OK From the day beafore nothing change except that I added a new type of data but only 15 logs every 1 minute -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fd9e3e2-c38b-4678-995a-80787375267f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Java API or REST API for client development ?
I use a in house developed java rest client for elasticsearch. Unfortunately it's not in any shape to untangle from our code base and put on Github yet but I might consider that if there's more interest. Basically I use apache httpclient, I implemented a simple round robin strategy so I can failover if nodes go down, and I implemented a simple rest client around this to support put/post/delete/get requests. Also added some basic interpretation of statuses and have mapped those to sensible exceptions. The idea is that this client is wrapped with another client that supports more high level APIs that are exposed from elasticsearch. So you can do things like index/delete documents, manage aliases, do bulk indexing etc. My long term goal was actually to have two implementations of that client one for REST and one for embedded elasticsearch. That would be an interesting project because it would give you choice. Except, I never got around to doing the embedded client implementation since we don't really need it so far. Something else that we use is to model the query DSL using static java methods and provides a simple DSL for creating queries in Java. This in turn uses my github jsonj project that allows you to programmatically manipulate json structures. None of this is particularly complicated but altogether there is quite a bit of code to write and quite a few things you can get wrong. It's always hard to separate the general purpose stuff from the application specific stuff and thats one reason why I have not yet put this code out. Jilles On Wednesday, March 26, 2014 10:46:16 AM UTC+1, Subhadip Bagui wrote: Hi, We have a cloud management framework where all the event data are to be stored in elasticsearch. I have to start the client side code for this. I need a suggestion here. Which one should I use, elasticsearch Java API or REST API for the client ? Kindly suggest and mention the pros and cons for the same so it will be easy for me to decide the product design than latter hassel. Subhadip -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/350caf92-63c1-4a0c-a1b5-781cb4b09cfb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Failed start of 2nd instance on same host with mlockall=true
You should run one node per host. Two nodes add overhead and suffer from the effects you described. For mlockall, the user needs privilege to allocate the specified locked mem, and the OS need contiguous RAM per mlockall call. If the user's memlock limit is exhausted, or if RAM allocation gets fragmented, memlocking is no longer possible and fails. Jörg On Tue, Aug 26, 2014 at 2:54 PM, R. Toma renzo.t...@gmail.com wrote: Hi all, In an attempt to squeeze more power out of our physical servers we want to run multiple ES jvm's per server. Some specs: - servers has 24 cores, 256GB ram - each instance binds on different (alias) ip - each instance has 32GB heap - both instances run under user 'elastic' - limits for 'elastic' user: memlock=unlimited - es config for both instances: bootstrap.mlockall=true The 1st instance has been running for weeks. When starting the 2nd instance the following things happen: - increase of overal cpu load - lots of I/O to disks - no logging for 2nd instance - 2nd instance hangs - 1st instance keeps running, but gets slowish - cd /proc/pid causes a hang of cd process (until 2nd instance is killed) - exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed) Maybe (un)related: I have never been able to run Elasticsearch in a virtualbox with memlock=unlimited and mlockall=true. After an hour of trial errors I found that removing setting 'bootstrap.mlockall' (setting it to false) from 2nd instance's configuration fixes things. I am confused, but acknowledge I do not know anything about memlocking. Any ideas? Regards, Renzo -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvtj3NKTWyMTjTre1FfJS31Khn%3DDAy_kCxgVcCFpmDSQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: how to use my customer lucene analyzer(tokenizer)?
Thanks Jun, that was helpful. It helped me to realize I had not fully connected my analyzer plugin. On Thursday, August 21, 2014 11:23:47 PM UTC-7, Jun Ohtani wrote: Hi Art, I wrote an example specifying the kuromoji analyzer(kuromoji) and custom analyzer(my_analyzer) for a field. curl -XPUT http://localhost:9200/kuromoji-sample; -d' { settings: { index: { analysis: { analyzer: { my_analyzer: { tokenizer: kuromoji_tokenizer, filter: [ kuromoji_baseform ] } } } } }, mappings: { sample: { properties: { title: { type: string, analyzer: my_analyzer }, body : { type: string, analyzer: kuromoji } } } } }' I hope that it will be helpful for you. 2014-08-22 9:18 GMT+09:00 a...@safeshepherd.com javascript:: I have the same question about using an analyzer I have written as a plug-in for ElasticSearch 1.3. https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/blob/es-1.3/README.md demonstrates only how to use the tokenizers in combination with the built-in CustomAnalyzer. They do not show how to use the kuromoji analyzer itself. When I try to specify my analyzer for a field, I get errors like this: MapperParsingException[Analyzer [special_analyzer] not found for field [foo]]; Can you show an example of how to specify the kuromoji analyzer for a field? I should then be able to adapt it for use with my plugin analyzer. Thanks in advance, Art On Tuesday, August 5, 2014 12:34:42 AM UTC-7, Jun Ohtani wrote: Hi, I think this plugin will be helpful for you. https://github.com/elasticsearch/elasticsearch-analysis-kuromoji 2014/08/05 15:58 fanc...@gmail.com: I want to use my own Chinese analyzer and I can write lucene analyzer class myself. How can I integrate it to elasticsearch? I googled and found http://www.elasticsearch.org/guide/en/ elasticsearch/guide/current/custom-analyzers.html. But it only combine existing tokenizers and filters. I can use tokenizer writing in java by myself. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/c3fe52cd-8cb5-4c53-b0fe-87183deb45bf% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c3fe52cd-8cb5-4c53-b0fe-87183deb45bf%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da795847-3ea2-4afb-9a7b-aefdd6f111a0%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/da795847-3ea2-4afb-9a7b-aefdd6f111a0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- --- Jun Ohtani blog : http://blog.johtani.info -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a792d08d-534f-4619-bfcb-0f01262b6c51%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Function Query with an aggregation function of nested field
I have documents with the above mentioned schema. authorId : 10 authorName: Joshua Bloch books: { { bookId: 101 bookName: Effective Java description : effective java book with useful recommendations Category: 1 sales: { { keyword: effective java count: 200 }, { keyword: java tips count: 100 }, { keyword: java joshua bloch count: 50 } } createDate: 08-25-2014 }, { bookId: 102, bookName: Java Puzzlers description : Java Puzzlers: Traps, Pitfalls, and Corner Cases Category: 2 sales: { { keyword: java puzzlers count: 100 }, { keyword: joshua bloch puzzler count: 50 } } } } The sales information is stored with each book along with the search query that lead to that sales. If the user applied a category filter, I would like to count only books that belong to that category. I would like to sort the list of authors returned based on a function of sales data and text match. For eg if the search query is java I would like to return the above mentioned doc and all other author documents which has the term java in them. I came up with the following query: { query: { function_score: { boost_mode: replace, query: { match: { bookName:java} }, script_score: { params: { param1: 2 }, script: doc['books.sales.count'].isEmpty() ? _score : _score * doc['books.sales.count'].value * param1 } } } } I have few questions with the query i have above 1. The results dont look sorted by sales. I have authors who dont have any books with sales in them at the top 2. How do i use the sum of all sales for an author (across all books within the author document) in the script ? Is there a sum function for the nested fields inside a document when using script_score ? Note that sales is a nested field inside another nested field products. 3. As a next step I would also like to use a filter for keyword within the script_score to only include sales whose keyword value matches with the search query term Any help would be much appreciated. Thanks Srini -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f858caee-bb43-45e1-ada3-212a78378aa0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony.apo...@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : {index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true, path: full, properties : { location : { type : geo_point } } } }} }} On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote: I have no plugins installed (yet) and only changed es.logger.level to DEBUG in logging.yml. elasticsearch.yml: cluster.name: es-AMS1Cluster node.name: KYLIE1 node.rack: amssc2client02 path.data: /export/home/apontet/elasticsearch/data path.work: /export/home/apontet/elasticsearch/work path.logs: /export/home/apontet/elasticsearch/logs network.host: = sanitized line; file contains actual server IP discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7] = Also sanitized Thanks, Tony On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote: I tested a simple Hello World document on Elasticsearch 1.3.2 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings. No issues. So I would like to know more about the settings in elasticsearch.yml, the mappings, and the installed plugins. Jörg On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com wrote: I have some Solaris 10 Sparc V440/V445 servers available and can try to reproduce over the weekend. Jörg On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir rober...@elasticsearch.com wrote: How big is it? Maybe i can have it anyway? I pulled two ancient ultrasparcs out of my closet to try to debug your issue, but unfortunately they are a pita to work with (dead nvram battery on both, zeroed mac address, etc.) Id still love to get to the bottom of this. On Aug 22, 2014 3:59 PM, tony@iqor.com wrote: Hi Adrien, It's a bunch of garbled binary data, basically a dump of the process image. Tony On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote: Hi Tony, Do you have more information in the core dump file? (cf. the Core dump written line that you pasted) On Thu, Aug 21, 2014 at 7:53 PM, tony@iqor.com wrote: Hello, I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to scale out of small x86 machine. I get a similar exception running ES with JAVA_OPTS=-d64. When Logstash 1.4.1 sends the first message I get the error below on the ES process: # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209 # # JRE version: 7.0_25-b15 # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode solaris-sparc compressed oops) # Problematic frame: # V [libjvm.so+0xba3d8c] Unsafe_GetInt+0x158 # # Core dump written. Default location: /export/home/elasticsearch/ elasticsearch-1.3.2/core or core.14473 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x000107078000): JavaThread elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker #147} daemon [_thread_in_vm, id=209, stack(0x5b80, 0x5b84)] siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN), si_addr=0x000709cc09e7 I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE more than I want to. Any assistance would be appreciated. Regards, Tony On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote:
Re: indices.memory.index_buffer_size
Thanks Mark. What confuses me are global setting (which suggests cluster-wide setting) and on a specific node (which suggests node level setting). I could just try it out, but it's hard to tell if the setting worked or not. :( On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-indices.html states It is a global setting that bubbles down to all the different shards allocated on a specific node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 25 August 2014 03:12, Yongtao You yongt...@gmail.com javascript: wrote: Hi, Is the indices.memory.index_buffer_size configuration a cluster wide configuration or per node configuration? Do I need to set it on every node? Or just the master (eligible) node? Thanks. Yongtao -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: indices.memory.index_buffer_size
I just looked at this code! Its a setting that you set globally at the cluster level. It takes effect per node. What that means is that for every active shard on each the node gets an equal share of that much space. Active means has been written to in the past six minutes or so. When a node first starts all shards are active assumed active and those that are not updated at all lose active status after the timeout. You can watch the little dance it does by setting index.engine.internal: DEBUG in logging.yml. Now - I'm not actually sure how important a setting it it is. I opened https://github.com/elasticsearch/elasticsearch/issues/7441 to get suggest allowing better spreading it around. Mike'll probably close it if spreading it around wouldn't really help things much. Nik On Tue, Aug 26, 2014 at 2:07 PM, Yongtao You yongtao@gmail.com wrote: Thanks Mark. What confuses me are global setting (which suggests cluster-wide setting) and on a specific node (which suggests node level setting). I could just try it out, but it's hard to tell if the setting worked or not. :( On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote: http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/modules-indices.html states It is a global setting that bubbles down to all the different shards allocated on a specific node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 25 August 2014 03:12, Yongtao You yongt...@gmail.com wrote: Hi, Is the indices.memory.index_buffer_size configuration a cluster wide configuration or per node configuration? Do I need to set it on every node? Or just the master (eligible) node? Thanks. Yongtao -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony.apo...@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : {index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true, path: full, properties : { location : { type : geo_point } } } }} }} On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote: I have no plugins installed (yet) and only changed es.logger.level to DEBUG in logging.yml. elasticsearch.yml: cluster.name: es-AMS1Cluster node.name: KYLIE1 node.rack: amssc2client02 path.data: /export/home/apontet/elasticsearch/data path.work: /export/home/apontet/elasticsearch/work path.logs: /export/home/apontet/elasticsearch/logs network.host: = sanitized line; file contains actual server IP discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7] = Also sanitized Thanks, Tony On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote: I tested a simple Hello World document on Elasticsearch 1.3.2 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings. No issues. So I would like to know more about the settings in elasticsearch.yml, the mappings, and the installed plugins. Jörg On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com wrote: I have some Solaris 10 Sparc V440/V445 servers available and can try to reproduce over the weekend. Jörg On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir rober...@elasticsearch.com wrote: How big is it? Maybe i can have it anyway? I pulled two ancient ultrasparcs out of my closet to try to debug your issue, but unfortunately they are a pita to work with (dead nvram battery on both, zeroed mac address, etc.) Id still love to get to the bottom of this. On Aug 22, 2014 3:59 PM, tony@iqor.com wrote: Hi Adrien, It's a bunch of garbled binary data, basically a dump of the process image. Tony On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote: Hi Tony, Do you have more information in the core dump file? (cf. the Core dump written line that you pasted) On Thu, Aug 21, 2014 at 7:53 PM, tony@iqor.com wrote: Hello, I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to scale out of small x86 machine. I get a similar exception running ES with JAVA_OPTS=-d64. When Logstash 1.4.1 sends the first message I get the error below on the ES process: # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209 # # JRE version: 7.0_25-b15 # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode solaris-sparc compressed oops) # Problematic frame: # V [libjvm.so+0xba3d8c] Unsafe_GetInt+0x158 # # Core dump written. Default location: /export/home/elasticsearch/ elasticsearch-1.3.2/core or core.14473 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x000107078000): JavaThread elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker #147} daemon [_thread_in_vm, id=209, stack(0x5b80, 0x5b84)] siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN), si_addr=0x000709cc09e7 I can
Elastic HQ not getting back vendor info from Elasticsearch.
I posted an issue with Elastic HQ here: https://github.com/royrusso/elasticsearch-HQ/issues/164 But just in case maybe an Elastic dev can have a look and see if it's Elasticsearch issue or not. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elastic HQ not getting back vendor info.
I posted an issue with Elastic HQ here: https://github.com/royrusso/elasticsearch-HQ/issues/164 But just in case maybe an Elastic dev can have a look and see if it's Elasticsearch issue or not. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6161414-ad80-4881-bf87-ede7f1818437%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: groovy for scripting
providing self-update: I found that I could create cross-request cache using next script (like a cross-request incrementer): POST /test/_search { query: {match_all:{}}, script_fields: { a: { script: import groovy.lang.Script;class A extends Script{static i=0;def run() {i++}}, lang: groovy } } } In good view mode the script is: import groovy.lang.Script class A extends Script{ static i=0 def run() { i++ } } Actually here *i* variable is not thread-safe, but idea is clean - you need define a class, inherited from Script and implement abstract method run. Also this class is access on each node-thread. Now I'm looking for a solution to make a query-scope type counter (for one-node configuration). I think it's could be done by passing unique query_id in parameters, but I'm afraid of making code non thread safe, or vice versa - thread safe, but with reduce performance. Researching more... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb402d2c-8820-4a1f-99e0-0453c0c82cf6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
term_stats return sometime return meaningless number
Our elasticsearch instance sometime return meaningless number for terms_stats, query return correct data. I am using Kibana as front end, this is generated query: {facets:{terms:{terms_stats:{value_field:metric,key_field:host,size:10,order:count},facet_filter:{fquery:{query:{filtered:{query:{bool:{should:[{query_string:{query:(service:\.StorageProxy.RecentReadLatencyMicros\) AND (layer:\cassandra\) AND (@timestamp:[now-1m TO now]) AND host:169.26.4.167}}]}},filter:{bool:{must:[{range:{@timestamp:{from:1408992976055,to:now}}},{terms:{host:[169.26.4.167]}},{terms:{host:[169.26.4.167]}}],size:0} max=4.6366831074216192E+18 mean=1.5455610358072064E+18 min=0 term=169.26.4.167 total=4.6366831074216192E+18 the metric field would have some number between 0 and 100 while the term stat report huge number. If i delete index, it will show correct term stats again. I tried refresh, close/open index, none seem to work except delete the index and recreate it. Have anyone face similar issue? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5e141e56-7e01-4899-949f-c3d7f69a353d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Failing Replica Shards
Hello, In the past couple of days I've been getting a lot of error messages about corrupted replica shards. The primary shards come up fast after ES process restart but replicas take a long time to come back. Sometimes it takes a few node restarts to 'kick' the nodes to start replica shards. ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer. It's a 3-way cluster with 4 logstash feeders hanging off it. Here are the errors; [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / Salvador Dali] [downloader-2014.08][4] received shard failed for [downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[downloader-2014.08][4] Corrupted index [corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc)) [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / Salvador Dali] [eventlog-2014.06][0] received shard failed for [eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: CorruptIndexException[codec footer mismatch: actual footer=0 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd)) [2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / Salvador Dali] [eventlog-2014.07][0] received shard failed for [eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: CorruptIndexException[codec footer mismatch: actual footer=0 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd)) Thanks, David -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0af53fb-6fdd-4624-bf6c-9b9d50081689%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Data per node in ES
Hi , We are analyzing ES for storing our log data (~ 400 GB/Day) and will be integrating Logstash and ES. What is the maximum amount of data that can be stored on one node of ES ? Regards, Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Reduce Number of Segments
OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for spinning disks. Maybe try also disabling merge throttling and see if that has an effect? 6 MB/sec seems slow... Mike McCandless http://blog.mikemccandless.com On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker ch...@chris-decker.com wrote: Mike, Thanks for the response. I'm running ES 1.2.1. It appears the issue that you reported / corrected was included with ES 1.2.0. *Any other ideas / suggestions? *Were the settings that I posted sane? Thanks!, Chris On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote: Which version of ES are you using? Versions before 1.2 have a bug that caused merge throttling to throttle far more than requested such that you couldn't get any faster than ~8 MB / sec. See https://github.com/ elasticsearch/elasticsearch/issues/6018 Tiered merge policy is best. Mike McCandless http://blog.mikemccandless.com On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com wrote: All, I’m looking for advice on how to reduce the number of segments for my indices because in my use case (log analysis), quick searches are more important than real-time access to data. I've turned many of the knobs available within ES, and read many blog postings, ES documentation, etc., but still feel like there is room for important. Specific questions I have: 1. How can I increase the current merge rate? According to Elastic HQ, my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs, but with 15k drives it seems like I should be able to get better rates. I tried increasing indices.store.throttle.max_bytes_per_sec from the default of 20mb to 40mb in my templates, but I didn't see a noticeable change in disk IOps or the merge rate the next day. Did I do something incorrectly? I'm going to experiment with setting it overall with index.store.throttle.max_bytes_per_sec and removing it from my templates. 2. Should I move away from the default merge policy, or stick with the default (tiered)? Any advice you have is much appreciated; additional details on my situation are below. - I generate 2 indices per day - “high” and “low”. I usually end up with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200 segments for my ‘low’ index, which I then optimize once I roll-over to the next day’s indices. - 4 ES servers (soon to be 8). — Each server has: 12 Xeon cores running at 2.3 GHz 15k drives 128 GB of RAM 68 GB used for OS / file system machine 60 GB used by 2 JVMs - Index ~ 750 GB per day; 1.5 TB if you include the replicas - Relevant configs: TEMPLATE: index.refresh_interval : 60s, index.number_of_replicas : 1, index.number_of_shards : 4, index.merge.policy.max_merged_segment : 50g, index.merge.policy.segments_per_tier : 5, index.merge.policy.max_merge_at_once : “5”, indices.store.throttle.max_bytes_per_sec : 40mb. ELASTICSEARCH.YML: indices.memory.index_buffer_size: 30% Thanks in advance!, Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch for logging. HOW to configure automatic creation of the new index every day?
Hello Konstantin, You can use index value of name-%{+.MM.dd} in your elasticsearch output in logstash (link: http://logstash.net/docs/1.4.2/outputs/elasticsearch#index) HTH, David On Tuesday, August 26, 2014 10:01:39 AM UTC-7, Konstantin Erman wrote: Most of the guides I could find recommend creation of *one index per day* when Elastic is used to store and query log files. Unfortunately not a single guide dares to explain *HOW exactly shall I configure freshly installed Elastic to create new index every day*. Could somebody please help me with it? A few bits of additional info: I deal with Elastic on Windows Server (or may be on Azure, but not any Linux) and I (plan) to send log events to Elastic using Serilog. Any advise for those special circumstances appreciated. Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7c2fbf8d-1c5e-435d-945b-2e6baf012abe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: indices.memory.index_buffer_size
See also https://github.com/elasticsearch/elasticsearch/pull/7440 (will be in 1.4.0) which returns the actual RAM buffer size assigned to that shard by the little dance. Mike McCandless http://blog.mikemccandless.com On Tue, Aug 26, 2014 at 2:15 PM, Nikolas Everett nik9...@gmail.com wrote: I just looked at this code! Its a setting that you set globally at the cluster level. It takes effect per node. What that means is that for every active shard on each the node gets an equal share of that much space. Active means has been written to in the past six minutes or so. When a node first starts all shards are active assumed active and those that are not updated at all lose active status after the timeout. You can watch the little dance it does by setting index.engine.internal: DEBUG in logging.yml. Now - I'm not actually sure how important a setting it it is. I opened https://github.com/elasticsearch/elasticsearch/issues/7441 to get suggest allowing better spreading it around. Mike'll probably close it if spreading it around wouldn't really help things much. Nik On Tue, Aug 26, 2014 at 2:07 PM, Yongtao You yongtao@gmail.com wrote: Thanks Mark. What confuses me are global setting (which suggests cluster-wide setting) and on a specific node (which suggests node level setting). I could just try it out, but it's hard to tell if the setting worked or not. :( On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote: http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/modules-indices.html states It is a global setting that bubbles down to all the different shards allocated on a specific node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 25 August 2014 03:12, Yongtao You yongt...@gmail.com wrote: Hi, Is the indices.memory.index_buffer_size configuration a cluster wide configuration or per node configuration? Do I need to set it on every node? Or just the master (eligible) node? Thanks. Yongtao -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcwFh-qCtu6B15Ni9KjzCYVojXtc4KTzMc%2Be1BVHZ%3D-Bw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Can't open file to read checksums
A few questions: What version of Elasticsearch are you using? Are you using the Java client and is it the same version of the cluster? Did you upgrade recently and was the index built with an older version of Elasticsearch? Elasticsearch recently added checksum verification (1.3?), so perhaps you have some sort of version mismatch. Cheers, Ivan On Mon, Aug 25, 2014 at 10:52 AM, Casper Thrane casper.s.thr...@gmail.com wrote: Hi! We get the following errors, on two of our nodes. And after that our cluster doesn't work. I have no idea what it means. [2014-08-25 17:46:39,323][WARN ][indices.store] [p-elasticlog03] Can't open file to read checksums java.io.FileNotFoundException: No such file [_6cq_es090_0.doc] at org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:173) at org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144) at org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130) at org.elasticsearch.index.store.Store$MetadataSnapshot.checksumFromLuceneFile(Store.java:532) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:459) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:433) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:271) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Br Casper -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f417878-3e49-478d-90e7-ca5c42734567%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5f417878-3e49-478d-90e7-ca5c42734567%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBYGbr-k%2BB41UUwUY7DVk27KUiuf1xY5kekkXoNRc3grg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch processing pipeline capability?
Is there any facility in elasticsearch to help with sending terms to an external processes after lucene processing (tokenization, filters, etc)? The idea here is having some external analysis / nlp code run against the documents while keeping all the pre-processing choices consistent and in one place (i.e. the analysis setup in elasticsearch index configuration). I am not very familiar with Lucene, but I believe possibly their update request processor is intended for scenarios like this needing a simple pipeline. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: term_stats return sometime return meaningless number
Additional information: take mean of boolean value return 4,607,182,418,800,017,408 Thanks On Tuesday, August 26, 2014 3:58:43 PM UTC-4, youwei chen wrote: Our elasticsearch instance sometime return meaningless number for terms_stats, query return correct data. I am using Kibana as front end, this is generated query: {facets:{terms:{terms_stats:{value_field:metric,key_field:host,size:10,order:count},facet_filter:{fquery:{query:{filtered:{query:{bool:{should:[{query_string:{query:(service:\.StorageProxy.RecentReadLatencyMicros\) AND (layer:\cassandra\) AND (@timestamp:[now-1m TO now]) AND host:169.26.4.167}}]}},filter:{bool:{must:[{range:{@timestamp:{from:1408992976055,to:now}}},{terms:{host:[169.26.4.167]}},{terms:{host:[169.26.4.167]}}],size:0} max=4.6366831074216192E+18 mean=1.5455610358072064E+18 min=0 term=169.26.4.167 total=4.6366831074216192E+18 the metric field would have some number between 0 and 100 while the term stat report huge number. If i delete index, it will show correct term stats again. I tried refresh, close/open index, none seem to work except delete the index and recreate it. Have anyone face similar issue? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/781d0077-93d4-4d1d-93ca-953e84704964%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Marvel not showing nodes stats
I'm experiencing a similar issue to this. We have two clusters: - 2 node monitoring cluster (1 master/data 1 just data) - 5 node production cluster (2 data, 3 masters) The output below is from the non-master data node of the Marvel monitoring cluster. There are no errors being reported by any of the production nodes. [2014-08-26 21:10:51,503][DEBUG][action.search.type ] [stage-search-marvel-1c] [.marvel-2014.08.26][2], node[iGRH8Gc2QO698RMlWy8rgQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@355e93ff] org.elasticsearch.transport.RemoteTransportException: [stage-search-marvel-1b][inet[/10.99.111.122:9300]][search/phase/query] Caused by: org.elasticsearch.search.SearchParseException: [.marvel-2014.08.26][2]: query[ConstantScore(BooleanFilter(+*:* +cache(_type:index_stats) +cache(@timestamp:[140908680 TO 140908746])))],from[-1],size[10]: Parse Failure [Failed to parse source [{size:10,query:{filtered:{query:{match_all:{}},filter:{bool:{must:[{match_all:{}},{term:{_type:index_stats}},{range:{@timestamp:{from:now-10m/m,to:now/m}}}],facets:{timestamp:{terms_stats:{key_field:index.raw,value_field:@timestamp,order:term,size:2000}},primaries.docs.count:{terms_stats:{key_field:index.raw,value_field:primaries.docs.count,order:term,size:2000}},primaries.indexing.index_total:{terms_stats:{key_field:index.raw,value_field:primaries.indexing.index_total,order:term,size:2000}},total.search.query_total:{terms_stats:{key_field:index.raw,value_field:total.search.query_total,order:term,size:2000}},total.merges.total_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.merges.total_size_in_bytes,order:term,size:2000}},total.fielddata.memory_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.fielddata.memory_size_in_bytes,order:term,size:2000]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:664) at org.elasticsearch.search.SearchService.createContext(SearchService.java:515) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:487) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:256) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:688) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:677) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [timestamp]: failed to find mapping for index.raw at org.elasticsearch.search.facet.termsstats.TermsStatsFacetParser.parse(TermsStatsFacetParser.java:126) at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93) at org.elasticsearch.search.SearchService.parseSource(SearchService.java:648) ... 9 more [2014-08-26 21:10:51,503][DEBUG][action.search.type ] [stage-search-marvel-1c] [.marvel-2014.08.26][2], node[iGRH8Gc2QO698RMlWy8rgQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@32f235e9] org.elasticsearch.transport.RemoteTransportException: [stage-search-marvel-1b][inet[/10.99.111.122:9300]][search/phase/query] Caused by: org.elasticsearch.search.SearchParseException: [.marvel-2014.08.26][2]: query[ConstantScore(BooleanFilter(+*:* +cache(_type:node_stats) +cache(@timestamp:[140908680 TO 140908746])))],from[-1],size[10]: Parse Failure [Failed to parse source [{size:10,query:{filtered:{query:{match_all:{}},filter:{bool:{must:[{match_all:{}},{term:{_type:node_stats}},{range:{@timestamp:{from:now-10m/m,to:now/m}}}],facets:{timestamp:{terms_stats:{key_field:node.ip_port.raw,value_field:@timestamp,order:term,size:2000}},master_nodes:{terms:{field:node.ip_port.raw,size:2000},facet_filter:{term:{node.master:true}}},os.cpu.usage:{terms_stats:{key_field:node.ip_port.raw,value_field:os.cpu.usage,order:term,size:2000}},os.load_average.1m:{terms_stats:{key_field:node.ip_port.raw,value_field:os.load_average.1m,order:term,size:2000}},jvm.mem.heap_used_percent:{terms_stats:{key_field:node.ip_port.raw,value_field:jvm.mem.heap_used_percent,order:term,size:2000}},fs.total.available_in_bytes:{terms_stats:{key_field:node.ip_port.raw,value_field:fs.total.available_in_bytes,order:term,size:2000}},fs.total.disk_io_op:{terms_stats:{key_field:node.ip_port.raw,value_field:fs.total.disk_io_op,order:term,size:2000]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:664) at
Re: Parent/Child query performance in version 1.1.2
Just wanted to close the loop on this in case anyone stumbled upon the same issue. After upgrading to version 1.3.2 which had the performance increase stemming from https://github.com/elasticsearch/elasticsearch/pull/5846, we were able to see a dramatic decrease in parent/child query latency. We're executing queries under 150ms which is manageable for now and will be eagerly awaiting further improvements from the work Clinton highlighted here: https://github.com/elasticsearch/elasticsearch/issues/7394. Along the way in our testing we got a little confused as we attempted to do our troubleshooting on 1 data node in order to keep things simple, this manifested in some misplaced assumptions around the performance increases that came from work released in 1.2.0. In our testing on a single node, we did _not_ observe a latency decrease at all when going from 1.1.2 to 1.3.2. However, when we changed our test cluster to use two data nodes, we saw a huge improvement. So my earlier assertion around not seeing those improvements in version 1.3.2 was incorrect although I'm still confused as to why a single node configuration was not benefiting. In any case, wanted to thank the ES developers for being generous with their time helping us track this issue down. Now that I realize the incredible pace in which ES versions are released, we'll be much more vigilant about keeping up. Thanks again! On Monday, August 25, 2014 11:32:38 AM UTC-4, Mark Greene wrote: Hey Clinton, Thanks for the heads up on what's on the horizon. That definitely sounds like a drastic improvement. That being said, my fear here is that even with that improvement, this data model (parent/child) doesn't seem to that performant with a moderate amount of documents. In order for us to really adopt this methodology of using parent/child, we'd expect to see sub 100ms performance so long as we were feeding ES with enough RAM. My hunch here is there must be some code path that is hit when running on more than 1 data node that either doesn't write to the cache or skips it on the read and hits the disk. We don't have a ton of load on our data nodes, CPU is well under 30% and IOWait is usually under 0.30. Just to reiterate, when we run the parent/child query on one data node, it runs in less than 100ms, when it runs across two data nodes, it's 10s. This is being experienced on version 1.1.2 and 1.3.2. On Monday, August 25, 2014 10:55:15 AM UTC-4, Clinton Gormley wrote: Something else to note: parent-child now uses global ordinals to make queries 3x faster than they were previously, but global ordinals need to be rebuilt after the index has refreshed (assuming some data has changed). Currently there is no way to refresh p/c global ordinals eagerly (ie during the refresh phase) and so it happens on the first query after a refresh. 1.3.3 and 1.4.0 will include an option to allow eager building of global ordinals which should remove this latency spike: https://github.com/elasticsearch/elasticsearch/issues/7394 You may want to consider increasing the refresh_interval so that global ordinals remain valid for longer. On 25 August 2014 16:48, Mark Greene ma...@evertrue.com wrote: Hi Adrien, Thanks for reaching out. We actually were exited to see the performance improvements stated in the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance improvement but it wasn't orders of magnitude and queries are still running very slow. We also tried your suggestion of using the 'preference=_local' query param but we didn't see any difference there. Additionally, running the query 10 times, we saw no improvement in speed. Currently, the only major performance increase we've seen with parent/child queries is dropping down to 1 data node, at which, we see queries executing well under the 100ms mark. On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote: Hi Mark, Given that you had 1 replica in your first setup, it could take several queries to warm up the field data cache completely, does the query still take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but just to be sure) Does it change anything if you query elasticsearch with preference=_local? This should be equivalent to your single-node setup, so it would be interesting to see if that changes something. As a side note, you might want to try out a more recent version of Elasticsearch since parent/child performance improved quite significantly in 1.2.0 because of https://github.com/elasticsearch/elasticsearch/ pull/5846 On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene ma...@evertrue.com wrote: I wanted to update the list with an interesting piece of information. We found that when we took one of our two data nodes out of the cluster, leaving just one data node with no replicas, the query performance increased dramatically. The queries are now
Re: elasticsearch processing pipeline capability?
If you want to retrieve the term list of an index after Lucene processing via REST HTTP API, you can try https://github.com/jprante/elasticsearch-index-termlist Jörg On Tue, Aug 26, 2014 at 10:41 PM, Kevin B blaisde...@gmail.com wrote: Is there any facility in elasticsearch to help with sending terms to an external processes after lucene processing (tokenization, filters, etc)? The idea here is having some external analysis / nlp code run against the documents while keeping all the pre-processing choices consistent and in one place (i.e. the analysis setup in elasticsearch index configuration). I am not very familiar with Lucene, but I believe possibly their update request processor is intended for scenarios like this needing a simple pipeline. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGQ1NaTJn31H%3DTn7xLQTwagXWSDT5vM3xDLtt9wfcTaTw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How do I start elasticsearch as a service?
Forgive me I'm a little lost. I am working on deploying elasticsearch on a AWS server. Previously in development I have started elasticsearch using ./bin/elasticsearch -Des.config=/etc/elasticsearch/elasticsearch.yml But in live deployment, I want to keep elasticsearch running as a service... I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance. I run sudo /etc/init.d/elasticsearch start and I get: * Starting Elasticsearch server I check sudo /etc/init.d/elasticsearch status and I get: * elasticsearch is not running I'm not sure how to troubleshoot. Any advice or suggestions? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I start elasticsearch as a service?
Check the logs under /var/log/elasticsearch, they should have something. Also please be aware that 1.2.0 has a critical bug and you should be using 1.2.1 instead. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 08:42, Eric Greene ericdgre...@gmail.com wrote: Forgive me I'm a little lost. I am working on deploying elasticsearch on a AWS server. Previously in development I have started elasticsearch using ./bin/elasticsearch -Des.config=/etc/elasticsearch/elasticsearch.yml But in live deployment, I want to keep elasticsearch running as a service... I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance. I run sudo /etc/init.d/elasticsearch start and I get: * Starting Elasticsearch server I check sudo /etc/init.d/elasticsearch status and I get: * elasticsearch is not running I'm not sure how to troubleshoot. Any advice or suggestions? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Ya1xeRTaebytv9D8Zv9zTK-7XoB3zK4vhvRNHQuX%3DgMQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic HQ not getting back vendor info from Elasticsearch.
ElasticHQ is a community plugin, the ES devs can't help here. I have raised issues against ElasticHQ in the past and Roy has fixed them pretty quickly :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 04:44, John Smith java.dev@gmail.com wrote: I posted an issue with Elastic HQ here: https://github.com/ royrusso/elasticsearch-HQ/issues/164 But just in case maybe an Elastic dev can have a look and see if it's Elasticsearch issue or not. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a4FzMnmVpR-piEaTpbqMzizdfNdo3QdcuG-ascgZt5vg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Aggregation query works with search, but not with msearch
I am trying to troubleshoot the following observation: Following code works as expected: Elasticsearch::Model.client.search search_type: 'count', index: target_indices, body: query Response: {took=2, timed_out=false, _shards={total=2, successful=2, failed=0}, hits={total=6, max_score=0.0, hits=[]}, aggregations={recent={doc_count=3, searches={buckets=[{key=user-1, doc_count=3}] However, when using the above in an msearch, the response is not useful: Elasticsearch::Model.client.msearch body: [{ search_type: 'count', index: target_indices, search: query }] Response: {responses=[{took=0, timed_out=false, _shards={total=2, successful=2, failed=0}, hits={total=6, max_score=0.0, hits=[]}}]} --- What am I missing? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fef8c41-7232-43e6-8632-9e2e5058240d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
alerting in Marvel
Hi, I started using Marvel for my cluster monitoring. Does Marvel have a way to set notification such as send me email if cpu load is over 80%? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Getting different results while using bool query vs bool query with function score query
I am trying to add a custom boost to the different should clauses in the bool query, but I am getting different number of results when I use the bool query with 2 should clauses containing 2 simple query string query vs a bool query with 2 should clauses with 2 function score query encapsulating the same simple query string queries. The following query returns me 2 results for my data set: { query : { filtered : { query : { bool : { should : [ { simple_query_string : { query : 128, fields : [ content.name_enu.simple ] } }, { simple_query_string : { query : 128, fields : [ content.name_enu.simple_with_numeric ] } } ] } }, filter : { bool : { must : [ { term : { securityInfo.securityType : open } }, { bool : { must : [ { term : { sourceId.sourceSystem : jmeter_007971_numeric } }, { term : { sourceId.type : file } } ] } } ], _cache : true } } } }, fields : [ elementId, sourceId.id, sourceId.type, sourceId.sourceSystem, sourceVersion, content.name_enu ] } Where as if I use the following query I get 5 results, same simple query strings but with function scores: { query : { filtered : { query : { bool : { should : [ { function_score : { query : { simple_query_string : { query : 128, fields : [ content.name_enu.simple ] } }, boost_factor : 1.5 } }, { function_score : { query : { simple_query_string : { query : 128, fields : [ content.name_enu.simple_with_numeric ] } }, boost_factor : 2.5 } } ] } }, filter : { bool : { must : [ { term : { securityInfo.securityType : open } }, { bool : { must : [ { term : { sourceId.sourceSystem : jmeter_007971_numeric } }, { term : { sourceId.type : file } } ] } } ], _cache : true } } } }, fields : [ elementId, sourceId.id, sourceId.type, sourceId.sourceSystem, sourceVersion, content.name_enu ] } From my understanding of how the should clause works I was expecting both the queries to return 5 results but I am not able to understand why the 1st query returns me 2 results for my data set. The content.name_enu.simple uses a simple analyzer, whereas simple_with_numeric uses whitespace tokenizer and lowercase filter -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: alerting in Marvel
Nope. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:14, kti...@hotmail.com wrote: Hi, I started using Marvel for my cluster monitoring. Does Marvel have a way to set notification such as send me email if cpu load is over 80%? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aY%2BjtFF%3D00wftO8ds%3DCO5J-F%3DQVTEwfO1xJVcmMV2pHw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: gateway.recover_after_nodes minimum_master_nodes in a distributed environment?
Only master eligible for discovery.zen.minimum_master_nodes, so in your case it is 1. And that's bad as you can end up with a split brain situation. You should, if you can, make all three nodes master eligible. gateway.recover_after_nodes is all nodes, as per http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 26 August 2014 23:37, Chris Neal chris.n...@derbysoft.net wrote: Hello all, Question about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in a distributed ES cluster. By distributed I mean I have: 2 nodes that are data only: 'node.data' = 'true', 'node.master' = 'false', 'http.enabled' = 'false', 1 node that is a master/search only node: 'node.master' = 'true', 'node.data' = 'false', 'http.enabled' = 'true', When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1 formula including *all* nodes of all types in the cluster, or just those who can be masters? Similarly, when setting gateway.recover_after_nodes, is this value the number of all nodes of all types in the cluster, or just those that are data nodes? Thank you very much for your time! Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624biBGL9O7zaT%3Dfm3%2BfNRCMoDrQDR_CRCV%3DhM9FZCAqOpw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I start elasticsearch as a service?
Thanks Mark, I found that if I comment out the line in elasticsearch.yml that sets the data path, it works. I will upgrade as you have suggested, thanks for that. On Tuesday, August 26, 2014 4:04:05 PM UTC-7, Mark Walkom wrote: Check the logs under /var/log/elasticsearch, they should have something. Also please be aware that 1.2.0 has a critical bug and you should be using 1.2.1 instead. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 27 August 2014 08:42, Eric Greene ericd...@gmail.com javascript: wrote: Forgive me I'm a little lost. I am working on deploying elasticsearch on a AWS server. Previously in development I have started elasticsearch using ./bin/elasticsearch -Des.config=/etc/elasticsearch/elasticsearch.yml But in live deployment, I want to keep elasticsearch running as a service... I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance. I run sudo /etc/init.d/elasticsearch start and I get: * Starting Elasticsearch server I check sudo /etc/init.d/elasticsearch status and I get: * elasticsearch is not running I'm not sure how to troubleshoot. Any advice or suggestions? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58806f15-0a63-44f6-9a35-85a460384fa5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Data per node in ES
Depends. How much disk do you have? RAM? CPU? Java version and release? ES version? What's your query load like? Are you doing lots of aggregates or facets? The best way to know is to start using ELK on an platform indicative of your intended server size and then see how much data a single node can handle, then extrapolate. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 06:24, Gaurav Tiwari gtins...@gmail.com wrote: Hi , We are analyzing ES for storing our log data (~ 400 GB/Day) and will be integrating Logstash and ES. What is the maximum amount of data that can be stored on one node of ES ? Regards, Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YwMUZhTQ4Bt-DD67CQ3Z2ykA83ZA4GmyH_yjtBLAeZPA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: alerting in Marvel
Also, you should really be monitoring your systems and core measurements (disk, CPU etc) with something specific for the job. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:16, Mark Walkom ma...@campaignmonitor.com wrote: Nope. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:14, kti...@hotmail.com wrote: Hi, I started using Marvel for my cluster monitoring. Does Marvel have a way to set notification such as send me email if cpu load is over 80%? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b35tjeCpEpgFZKU1wJfLWjuT8vcD_mSrx5DuueLpc6VA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: alerting in Marvel
Hi, My goal was to figure out if i need to scale out if there is a sudden spike in the load. Can you be more specific about something specific for the job? On Tuesday, August 26, 2014 4:32:32 PM UTC-7, Mark Walkom wrote: Also, you should really be monitoring your systems and core measurements (disk, CPU etc) with something specific for the job. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 27 August 2014 09:16, Mark Walkom ma...@campaignmonitor.com javascript: wrote: Nope. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 27 August 2014 09:14, kti...@hotmail.com javascript: wrote: Hi, I started using Marvel for my cluster monitoring. Does Marvel have a way to set notification such as send me email if cpu load is over 80%? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da2d9029-8f74-4ad6-bf0d-dcca67517dfa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Micro Analysis in Kibana
The question is how the micro analysis of the Kibana cloud do this without setting 'not_analyzed' to the fields? On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote: You'll need to set the field name to not_analyzed so that you can get a distinct value for the whole field (instead of tokenized values): { mappings: { doc: { properties: { name: { type: string, index: not_analyzed } } } } } After that, you can do a terms facet on name and you'll get the count that you want. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fc80242-d98b-4227-932a-171352453dcf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Micro Analysis in Kibana
The question is how does the micro analysis of the Kibana can do this without setting 'not_analyzed' to the fields? On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote: You'll need to set the field name to not_analyzed so that you can get a distinct value for the whole field (instead of tokenized values): { mappings: { doc: { properties: { name: { type: string, index: not_analyzed } } } } } After that, you can do a terms facet on name and you'll get the count that you want. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3f3fc86-17fd-48be-bf19-3a796c84b464%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
RE: alerting in Marvel
ok so they are for monitoring the system running Elasticsearch. However, if i want to be notified of ES specific data points such as its JVM memory % there doesn't seem to be a solution. Thanks From: ma...@campaignmonitor.com Date: Wed, 27 Aug 2014 10:05:05 +1000 Subject: Re: alerting in Marvel To: elasticsearch@googlegroups.com Nagios, Zabbix, PRTG, Observium, or anything cloud hosted.Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:59, kti...@hotmail.com wrote: Hi, My goal was to figure out if i need to scale out if there is a sudden spike in the load. Can you be more specific about something specific for the job? On Tuesday, August 26, 2014 4:32:32 PM UTC-7, Mark Walkom wrote: Also, you should really be monitoring your systems and core measurements (disk, CPU etc) with something specific for the job.Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:16, Mark Walkom ma...@campaignmonitor.com wrote: Nope.Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:14, kti...@hotmail.com wrote: Hi,I started using Marvel for my cluster monitoring. Does Marvel have a way to set notification such as send me email if cpu load is over 80%? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da2d9029-8f74-4ad6-bf0d-dcca67517dfa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/dLUD-w6kNtk/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YkiLg3xTNPgUnkaxNmsVhV2gMd8oPEqM8Ue-UHFWvrSA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BLU182-W742A5C5D7DCDFFB8DF3D19FEDD0%40phx.gbl. For more options, visit https://groups.google.com/d/optout.
Re: gateway.recover_after_nodes minimum_master_nodes in a distributed environment?
Thank you Mark. Makes perfect sense. Chris On Tue, Aug 26, 2014 at 6:25 PM, Mark Walkom ma...@campaignmonitor.com wrote: Only master eligible for discovery.zen.minimum_master_nodes, so in your case it is 1. And that's bad as you can end up with a split brain situation. You should, if you can, make all three nodes master eligible. gateway.recover_after_nodes is all nodes, as per http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 26 August 2014 23:37, Chris Neal chris.n...@derbysoft.net wrote: Hello all, Question about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in a distributed ES cluster. By distributed I mean I have: 2 nodes that are data only: 'node.data' = 'true', 'node.master' = 'false', 'http.enabled' = 'false', 1 node that is a master/search only node: 'node.master' = 'true', 'node.data' = 'false', 'http.enabled' = 'true', When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1 formula including *all* nodes of all types in the cluster, or just those who can be masters? Similarly, when setting gateway.recover_after_nodes, is this value the number of all nodes of all types in the cluster, or just those that are data nodes? Thank you very much for your time! Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624biBGL9O7zaT%3Dfm3%2BfNRCMoDrQDR_CRCV%3DhM9FZCAqOpw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624biBGL9O7zaT%3Dfm3%2BfNRCMoDrQDR_CRCV%3DhM9FZCAqOpw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3Dpg8z92md3%3D-njUj4Zb%2BXu7Z65XOzDK4hw%2B6XjKv2pGSzA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is it possible to register a RestFilter without creating a plugin?
Thanks Vineeth, But I guess it does not change anything about REST API if elasticsearch offer some way that is easier than building a plugin to allow registering RestFiles to rest api calls. For a lot of frameworks, it is very common to provide configuration based approach to register some of pre/pro processors around services. I hope ES provide this kind of mechanism. But my first impression, it does not have such support at this time. Regards, Jack Jinyuan (Jack) Zhou On Tue, Aug 26, 2014 at 5:57 AM, vineeth mohan vm.vineethmo...@gmail.com wrote: Hello Jinyuan , I dont feel this is possible. In such a provision , how will you define what the REST API will do ? Thanks Vineeth On Tue, Aug 26, 2014 at 2:41 AM, Jinyuan Zhou zhou.jiny...@gmail.com wrote: Thanks, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/g_veXqDhQP4/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEkN%3DsgLqmkT-vwQ%2BptCs0LmPa0BDw5hX3A5Yzg-Wx_A%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEkN%3DsgLqmkT-vwQ%2BptCs0LmPa0BDw5hX3A5Yzg-Wx_A%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANBTPCFM0VWQJcwkdfF%3DEecEXOPicQQ5K7m26aWsLwycSfpLiQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Function Query with an aggregation function of nested field
Any thoughts anyone ? I am primarily looking for an answer to my 2nd question. On Tuesday, August 26, 2014 10:14:37 AM UTC-7, Srinivasan Ramaswamy wrote: I have documents with the above mentioned schema. authorId : 10 authorName: Joshua Bloch books: { { bookId: 101 bookName: Effective Java description : effective java book with useful recommendations Category: 1 sales: { { keyword: effective java count: 200 }, { keyword: java tips count: 100 }, { keyword: java joshua bloch count: 50 } } createDate: 08-25-2014 }, { bookId: 102, bookName: Java Puzzlers description : Java Puzzlers: Traps, Pitfalls, and Corner Cases Category: 2 sales: { { keyword: java puzzlers count: 100 }, { keyword: joshua bloch puzzler count: 50 } } } } The sales information is stored with each book along with the search query that lead to that sales. If the user applied a category filter, I would like to count only books that belong to that category. I would like to sort the list of authors returned based on a function of sales data and text match. For eg if the search query is java I would like to return the above mentioned doc and all other author documents which has the term java in them. I came up with the following query: { query: { function_score: { boost_mode: replace, query: { match: { bookName:java} }, script_score: { params: { param1: 2 }, script: doc['books.sales.count'].isEmpty() ? _score : _score * doc['books.sales.count'].value * param1 } } } } I have few questions with the query i have above 1. The results dont look sorted by sales. I have authors who dont have any books with sales in them at the top 2. How do i use the sum of all sales for an author (across all books within the author document) in the script ? Is there a sum function for the nested fields inside a document when using script_score ? Note that sales is a nested field inside another nested field products. 3. As a next step I would also like to use a filter for keyword within the script_score to only include sales whose keyword value matches with the search query term Any help would be much appreciated. Thanks Srini -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c86f6564-927d-4b9b-b9c6-4d8cdb4e72c1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
got QueryPhaseExecutionException when using custom query parser
Hi all, I wrote my own custom query parser, and extended elasticsearch as a plugin, the code is in the following link. query parser http://pastebin.mozilla.org/6172836 customized query http://pastebin.mozilla.org/6172837 plugin http://pastebin.mozilla.org/6172844 I used the default settings of Elasticsearch, and the document I PUT is { test: haha } { test: ahah } I used the query: { query: { backwards: { test: haha } } And the error message I got is: [2014-08-27 13:26:41,678][DEBUG][action.search.type ] [Poison] [test][2], node[w4ORe_ERQBeOVpII3P9w1w], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@7e1416e] lastShard [true] org.elasticsearch.search.query.QueryPhaseExecutionException: [test][2]: query[filtered(BackwardsQuery: test:ahah)-cache(_type:test)],from[0],size[10]: Query Failed [Failed to execute main query] at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203) at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.elasticsearch.backwardstermquery.BackwardsTermQuery$BackwardsScorer.docID(BackwardsTermQuery.java:118) at org.elasticsearch.backwardstermquery.BackwardsTermQuery$BackwardsScorer.nextDoc(BackwardsTermQuery.java:133) at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163) at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269) at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:156) ... 7 more I am very confused of it, could someone please point out what's wrong? Thank you so much! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2082f731-34c5-4d92-9fe0-439cef5fdabc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.