Index creation on Plugin instantiation
Hi, I'm experimenting with elasticsearch plugins creation and I'm trying to create an index (if missing) on plugin startup. I wanted to ask what is the best place to add the code snippet for code creation? I have added it at an injected binding with Client as constructor parameter but i get the following error: no known master node, scheduling a retry [2015-05-26 12:03:27,289][ERROR][bootstrap] {1.4.1}: Initialization Failed ... 1) UncategorizedExecutionException[Failed execution] ExecutionException[java.lang.NullPointerException] NullPointerException My guess is that Client is not ready yet to handle index creation requests, my code snippet is the following: public class IndexCreator { private final String indexName; private final ESLogger LOG; @Inject public IndexCreator(Settings settings, Client client) { this.LOG = Loggers.getLogger(getClass(), settings); this.indexName = settings.get(metis.index.name, .metis); String indexName = .metis-registry; IndicesExistsResponse resp = client.admin().indices().prepareExists(indexName).get(); if (!resp.isExists()) { client.admin().indices().prepareCreate(indexName).get(); } } } And I add this as binding to my module public class MyModule extends AbstractModule { private final Settings settings; public MyModule(Settings settings) { this.settings = Preconditions.checkNotNull(settings); } @Override protected void configure() { bind(IndexCreator.class).asEagerSingleton(); } } But it produces the overmentioned error, any ideas? Thanks, Thomas -- Please update your bookmarks! We have moved to https://discuss.elastic.co/ --- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7616e28-b6aa-45b0-989f-5ee7d55c02ca%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana 4 - ability to see source data from Dashboard
A colleague just pointed out that you can add a search to the dashboard. Seems to work :) On Tuesday, 14 April 2015 14:57:43 UTC+1, Thomas Bratt wrote: Hi, I can't seem to get access to the original data by drilling down on the visualizations on the dashboard. Am I missing something? Many thanks, Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6b921ea9-404b-4f1a-9c11-b455304b7cb5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana 4 - ability to select a date range on dashboard that is reflected in other visualizations
Hi, I am using Kibana 4 with a Date Histogram. I can select a time range with the mouse but the other visualizations on the dashboard do not seem to update. I only have data from today which might be affecting things. Would appreciate it if someone could tell me how to get this to work :) Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/672d0d69-a84c-4e51-aff9-9302d6805215%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana 4 - ability to see source data from Dashboard
Hi, I can't seem to get access to the original data by drilling down on the visualizations on the dashboard. Am I missing something? Many thanks, Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c79b522e-1524-40cf-a8fb-9670fec1b807%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana: Mark warnings as solved
I know how to use a programming language and I could do start a own project. But I would like to avoid it, since it leads to plubming. I guess other people have same use case, and I would like to use (and improve) an existing project. But I have not found any up to now. How do other ELK users solve my use case? I guess I am missing something. Regards, Thomas Güttler Am Mittwoch, 8. April 2015 11:02:35 UTC+2 schrieb James Green: Couldn't you update the document with a flag on a field? On 8 April 2015 at 09:43, Thomas Güttler h...@tbz-pariv.de javascript: wrote: We are evaluating if ELK is the right tool for our logs and event messages. We need a way to mark warnings as done. All warnings of this type should be invisible in the future. Use case: There was a bug in our code and the dev team has created a fix. Continuous Integration is running, and soon the bug in the production system will be gone. We need a way to mark the warnings as this type of warning is already handled, and the fix will be in the production system during the next three hours. Can you understand what I want? How to handle this with ELK? Just removing these logs from ElasticSearch is not a solution, since during the next hours (after setting the flag done) new events can still come into the system. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff5e0583-3f1d-4ba4-af38-ee0a4823afc2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ff5e0583-3f1d-4ba4-af38-ee0a4823afc2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6edd4558-7035-48d2-85b2-7e88f6571acc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana: Mark warnings as solved
We are evaluating if ELK is the right tool for our logs and event messages. We need a way to mark warnings as done. All warnings of this type should be invisible in the future. Use case: There was a bug in our code and the dev team has created a fix. Continuous Integration is running, and soon the bug in the production system will be gone. We need a way to mark the warnings as this type of warning is already handled, and the fix will be in the production system during the next three hours. Can you understand what I want? How to handle this with ELK? Just removing these logs from ElasticSearch is not a solution, since during the next hours (after setting the flag done) new events can still come into the system. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff5e0583-3f1d-4ba4-af38-ee0a4823afc2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
ELK for logfiles
Hi, I am planing to use ELK for our log files. I read docs about logstash, elasticsearch and kibana. Still the whole picture is not solid. Especially the reporting area is something I can't understand up to now. Kibana seems to be a great tool to do the visualization. But can I get the single log for debugging the root of problems? Example: I see that 99 systems work fine, and 1 systems emits warnings. Which interface could I use the see the logs in ElasticSearch of this system? Needed features: Show all logs from system foo in the period between 2015-03-27 00:00 and 00:10 (ten minutes). Show all logs with log level error of system foo in day 2015-03-27 Is Kibana the right tool for this? Or am I on the wrong track? Which tool could be used to analyze log data in ElasticSearch? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a03e8696-6320-4911-8f03-2f7f7a756a58%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Using ELK to analyze log warnings and exceptions - and mark them as solved
We run several servers running our code. Of course there are bugs which cause exceptions and warnings since something unusual occurs. I want to analyze our logs to find unhandled warnings. I am unsure if ELK can help us. There need to be some way to aggregate warnings to a warning of type X (to remove duplicates). If a warning was handled and solved, we need a way to mark the warnings of type X as solved. The flag should only be set for a limited period of time (example 48 hours). During this time the new code should be deployed and the error should nor occur again. If it sill occurs after N hours the warning should be visible again. Can you understand what I want? Can this be done with ELK, or I am on the wrong track? Regards, Thomas Güttler -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d67d74ca-ab6a-4739-b119-63f52bbb7231%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Easy ELK Stack setup
Hi, I want to setup an ELK stack without wasting time. That's why I ask here before starting. My environment is simple: all traffic comes in from localhost. There is only one server for the ELK setup. But there will be several ELK stacks running in the future. But again each traffic will come in only from localhost. The systems will run isolated. I see these solutions: - take a docker container - do it by hand (RPM install) - use Chef/Puppet. But up to now we don't use any of those tools. - any other idea? What do you think? Regards, Thomas Güttler -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b82b581c-cb25-47f3-83f2-7f6877c21ec4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Leaving out information from the response
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html On Wed, Feb 25, 2015 at 9:13 AM, James m...@employ.com wrote: Hi, I want to have certain data in my elasticsearch index but I don't want it to be returned with a query. At the moment it seems to return every bit of data I have for each index and then I use my PHP application to hide it. Is it possible to select what fields elasticsearch returns in it's response to my PHP application. For example for each time: Name Location Description Keywords Unique ID Create date I just want to have in the response from elasticsearch: Name Location Description -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/089c53f5-0aa4-48b5-acb4-df4d6ccfee13%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/089c53f5-0aa4-48b5-acb4-df4d6ccfee13%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABY_-Z4x%3DgOq1EbtvsLLRgQn1Ad8Zd5QzpubZaL5KA98p7J2Xw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana 4 behind reverse proxy. Is it possible?
I hit the same issue when accessing the site using DNS name. When I am in the machine, http:localhost:/ works though. Have not figured out the fix for this yet. It seems like Kibana 4 CORS issue. On Thu, Jan 29, 2015 at 3:38 PM, Konstantin Erman kon...@gmail.com wrote: Yes, Kibana 4 beta 3. And I have just one URL rewrite rule (pictured). Were you getting the same error when it was not working for you? https://lh3.googleusercontent.com/-oDiu_ncjJlA/VMrEJL-Qj_I/Aic/so2IvrgTQbY/s1600/RewriteRule.png On Thursday, January 29, 2015 at 3:31:56 PM UTC-8, Cijo Thomas wrote: Can you show your URL rewrite rules ? Also are you using Kibana 4 beta 3 ? On Thu, Jan 29, 2015 at 1:09 PM, Konstantin Erman kon...@gmail.com wrote: Unfortunately I could not replicate your success :-( Let me show you what I did in case you may be notice any difference from your case. https://lh6.googleusercontent.com/-HzQRKhGl9ag/VMqfkWnSF8I/Ah0/SsXrJlQ2vW8/s1600/Output_Caching.png https://lh6.googleusercontent.com/-V2VTx-iT888/VMqf0K7jChI/Ah8/qC7umA0XP_U/s1600/AppPool1.png https://lh6.googleusercontent.com/-4jL3Hyoq0QY/VMqgF7d0-II/AiE/77VOeAZP2e0/s1600/AppPool2.png https://lh5.googleusercontent.com/-aBFCh_BZKn4/VMqgnM9ejhI/AiM/zxnsdD-VK8U/s1600/Error.png Any ideas what I may be missing? Thanks! Konstantin On Thursday, January 29, 2015 at 10:13:40 AM UTC-8, Cijo Thomas wrote: I have been fighting with this for quite some time, Finally found the workaround. Let me know if it helps you! On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com wrote: Thank you for the good news! I'm a little swamped currently, but I will definitely give it a try when I get a minute. Just to make sure - disable Output cache for the website - where is it in IIS Management Console? On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote: Its possible to use IIS with the following steps. 1) Disable Output cache for the website you are using as reverse proxy. 2) Run the website in a new apppool, which do not have any managed code. With the above two steps, kibana4 runs fine with IIS as reverse proxy. On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman wrote: We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to replace Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed directly, but we need authentication and whatever I do I cannot make it work from behind reverse proxy! Early or later I get 401 accessing some internal resource. Wonder if anybody hit similar problem and have any insight how to make it work. We cannot use Shield as its price is way beyond our bounds. Thanks! Konstantin -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/to pic/elasticsearch/r_uDcHR-rrw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40goo glegroups.com https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Warm Regards, Cijo Thomas +1 3125606441 #14b380f1ef4f9e0e_CAH6LTpE1RoLVPN1aQWeJm-6nyWiovzr=hGudAmUeAGGgBAuYWg@mail.gmail.com_14b3786b590d6772_CAH6LTpEkRjF1kDbKEm5Frvwb6BBCH_Xhvm1hhKtMYtYCphrraQ@mail.gmail.com_SafeHtmlFilter_ -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/elasticsearch/r_uDcHR-rrw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Warm Regards, Cijo Thomas +1 3125606441 #14b380f1ef4f9e0e_CAH6LTpE1RoLVPN1aQWeJm-6nyWiovzr=hGudAmUeAGGgBAuYWg@mail.gmail.com_SafeHtmlFilter_ -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/r_uDcHR-rrw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https
Re: Kibana 4 behind reverse proxy. Is it possible?
I have been fighting with this for quite some time, Finally found the workaround. Let me know if it helps you! On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com wrote: Thank you for the good news! I'm a little swamped currently, but I will definitely give it a try when I get a minute. Just to make sure - disable Output cache for the website - where is it in IIS Management Console? On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote: Its possible to use IIS with the following steps. 1) Disable Output cache for the website you are using as reverse proxy. 2) Run the website in a new apppool, which do not have any managed code. With the above two steps, kibana4 runs fine with IIS as reverse proxy. On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman wrote: We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to replace Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed directly, but we need authentication and whatever I do I cannot make it work from behind reverse proxy! Early or later I get 401 accessing some internal resource. Wonder if anybody hit similar problem and have any insight how to make it work. We cannot use Shield as its price is way beyond our bounds. Thanks! Konstantin -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/r_uDcHR-rrw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Warm Regards, Cijo Thomas +1 3125606441 #SafeHtmlFilter_ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH6LTpEkRjF1kDbKEm5Frvwb6BBCH_Xhvm1hhKtMYtYCphrraQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana 4 behind reverse proxy. Is it possible?
Can you show your URL rewrite rules ? Also are you using Kibana 4 beta 3 ? On Thu, Jan 29, 2015 at 1:09 PM, Konstantin Erman kon...@gmail.com wrote: Unfortunately I could not replicate your success :-( Let me show you what I did in case you may be notice any difference from your case. https://lh6.googleusercontent.com/-HzQRKhGl9ag/VMqfkWnSF8I/Ah0/SsXrJlQ2vW8/s1600/Output_Caching.png https://lh6.googleusercontent.com/-V2VTx-iT888/VMqf0K7jChI/Ah8/qC7umA0XP_U/s1600/AppPool1.png https://lh6.googleusercontent.com/-4jL3Hyoq0QY/VMqgF7d0-II/AiE/77VOeAZP2e0/s1600/AppPool2.png https://lh5.googleusercontent.com/-aBFCh_BZKn4/VMqgnM9ejhI/AiM/zxnsdD-VK8U/s1600/Error.png Any ideas what I may be missing? Thanks! Konstantin On Thursday, January 29, 2015 at 10:13:40 AM UTC-8, Cijo Thomas wrote: I have been fighting with this for quite some time, Finally found the workaround. Let me know if it helps you! On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com wrote: Thank you for the good news! I'm a little swamped currently, but I will definitely give it a try when I get a minute. Just to make sure - disable Output cache for the website - where is it in IIS Management Console? On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote: Its possible to use IIS with the following steps. 1) Disable Output cache for the website you are using as reverse proxy. 2) Run the website in a new apppool, which do not have any managed code. With the above two steps, kibana4 runs fine with IIS as reverse proxy. On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman wrote: We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to replace Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed directly, but we need authentication and whatever I do I cannot make it work from behind reverse proxy! Early or later I get 401 accessing some internal resource. Wonder if anybody hit similar problem and have any insight how to make it work. We cannot use Shield as its price is way beyond our bounds. Thanks! Konstantin -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/elasticsearch/r_uDcHR-rrw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Warm Regards, Cijo Thomas +1 3125606441 #14b3786b590d6772_CAH6LTpEkRjF1kDbKEm5Frvwb6BBCH_Xhvm1hhKtMYtYCphrraQ@mail.gmail.com_SafeHtmlFilter_ -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/r_uDcHR-rrw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Warm Regards, Cijo Thomas +1 3125606441 #SafeHtmlFilter_ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH6LTpE1RoLVPN1aQWeJm-6nyWiovzr%3DhGudAmUeAGGgBAuYWg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana 4 behind reverse proxy. Is it possible?
Its possible to use IIS with the following steps. 1) Disable Output cache for the website you are using as reverse proxy. 2) Run the website in a new apppool, which do not have any managed code. With the above two steps, kibana4 runs fine with IIS as reverse proxy. On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman wrote: We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to replace Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed directly, but we need authentication and whatever I do I cannot make it work from behind reverse proxy! Early or later I get 401 accessing some internal resource. Wonder if anybody hit similar problem and have any insight how to make it work. We cannot use Shield as its price is way beyond our bounds. Thanks! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7355f0ec-1bac-4e62-b675-a5f23d04ef7a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Out of memory on start with 38GB index
(BufferedUpdatesStream.java:287) at org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:267) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:257) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171) at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118) at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225) at org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:796) at org.elasticsearch.index.engine.internal.InternalEngine.delete(InternalEngine.java:692) at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:798) at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:268) at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2015-01-14 12:01:32,238][DEBUG][index.service] [Saint Elmo] [mailspool] [1] closing... (reason: [engine failure, message [refresh failed][OutOfMemoryError[Java heap space]]]) [2015-01-14 12:01:32,238][DEBUG][index.shard.service ] [Saint Elmo] [mailspool][1] state: [RECOVERING]-[CLOSED], reason [engine failure, message [refresh failed][OutOfMemoryError[Java heap space]]] [2015-01-14 12:01:32,315][DEBUG][index.service] [Saint Elmo] [mailspool] [1] closed (reason: [engine failure, message [refresh failed][OutOfMemoryError[Java heap space]]]) I tried adding a few settings to my elasticsearch.yml as suggested in the referenced issue : index.load_fixed_bitset_filters_eagerly: false index.warmer.enabled: false indices.breaker.total.limit: 30% But none of this settings seems to work for me. Our mapping is visible here : http://git.blue-mind.net/gitlist/bluemind/blob/master/esearch/config/templates/mailspool.json It is used to store a full text index of emails. It uses parent / child structure : The msgBody type contains the full text of the messages and attachments. The msg type contains user flags (unread, important, folder it is store into, etc). We use this structure as msg is often updated : mails are often marked as read or moved. The msgBody can be pretty big so we don't want to update the whole document when a simple email flag is changed. Does this kind of index structure reminds a particular bug or required setting ? Any rule of thumb to size memory regarding to index size on disk ? Regards, Thomas. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/580ef748-abe9-44f6-ab4e-e388fe5b7803%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: out of memory at startup with large index and parent/child relation
Hi, By removing all my translog files, ES can start without error. On Wednesday, January 14, 2015 at 2:56:48 PM UTC+1, Thomas Cataldo wrote: Hi, I encounter a problem with a large index (38GB) that prevents ES 1.4.2 from starting. The problem looks pretty similar to the one in https://github.com/elasticsearch/elasticsearch/issues/8394 I tried some of the recommandations from this post (and linked ones) : index.load_fixed_bitset_filters_eagerly: false index.warmer.enabled: false indices.breaker.total.limit: 30% And event with that, my server does not start [1]. I uploaded to gist the mapping for the index : https://gist.github.com/tcataldo/c0b6b3dfec9823bf6523 I tried several OS memory, ES heap combinations, the biggest being 48GiB for the operating system and 32GiB for ES heap and it still fails with that. Any idea or link to an open issue I could follow ? Regards, Thomas. 1. debug output: [2015-01-14 12:01:55,740][DEBUG][indices.cluster ] [Saint Elmo] [mailspool][0] creating shard [2015-01-14 12:01:55,741][DEBUG][index.service] [Saint Elmo] [mailspool] creating shard_id [0] [2015-01-14 12:01:56,041][DEBUG][index.deletionpolicy ] [Saint Elmo] [mailspool][0] Using [keep_only_last] deletion policy [2015-01-14 12:01:56,041][DEBUG][index.merge.policy ] [Saint Elmo] [mailspool][0] using [tiered] merge mergePolicy with expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_on\ ce[10], max_merge_at_once_explicit[30], max_merged_segment[5gb], segments_per_tier[10.0], reclaim_deletes_weight[2.0] [2015-01-14 12:01:56,041][DEBUG][index.merge.scheduler] [Saint Elmo] [mailspool][0] using [concurrent] merge scheduler with max_thread_count[2], max_merge_count[4] [2015-01-14 12:01:56,042][DEBUG][index.shard.service ] [Saint Elmo] [mailspool][0] state: [CREATED] [2015-01-14 12:01:56,043][DEBUG][index.translog ] [Saint Elmo] [mailspool][0] interval [5s], flush_threshold_ops [2147483647], flush_threshold_size [200mb], flush_threshold_period [3\ 0m] [2015-01-14 12:01:56,044][DEBUG][index.shard.service ] [Saint Elmo] [mailspool][0] state: [CREATED]-[RECOVERING], reason [from gateway] [2015-01-14 12:01:56,044][DEBUG][index.gateway] [Saint Elmo] [mailspool][0] starting recovery from local ... [2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] processing [reroute_rivers_node_changed]: execute [2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] processing [reroute_rivers_node_changed]: no change in cluster_state [2015-01-14 12:01:56,048][DEBUG][cluster.service ] [Saint Elmo] processing [shard-failed ([mailspool][3], node[gOgAuHo4SXyfyuPpws0Usw], [P], s[INITIALIZING]), reason [engine failure, \ message [refresh failed][OutOfMemoryError[Java heap space: done applying updated cluster_state (version: 4) [2015-01-14 12:01:56,062][DEBUG][index.engine.internal] [Saint Elmo] [mailspool][0] starting engine [2015-01-14 12:02:19,701][WARN ][index.engine.internal] [Saint Elmo] [mailspool][0] failed engine [refresh failed] java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:187) at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104) at org.elasticsearch.index.cache.filter.weighted.WeightedFilterCache$FilterCacheFilterWrapper.getDocIdSet(WeightedFilterCache.java:177) at org.elasticsearch.common.lucene.search.OrFilter.getDocIdSet(OrFilter.java:55) at org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46) at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:130) at org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542) at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136) at org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59) at org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:554) at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:287) at org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:267) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:257
out of memory at startup with large index and parent/child relation
Hi, I encounter a problem with a large index (38GB) that prevents ES 1.4.2 from starting. The problem looks pretty similar to the one in https://github.com/elasticsearch/elasticsearch/issues/8394 I tried some of the recommandations from this post (and linked ones) : index.load_fixed_bitset_filters_eagerly: false index.warmer.enabled: false indices.breaker.total.limit: 30% And event with that, my server does not start [1]. I uploaded to gist the mapping for the index : https://gist.github.com/tcataldo/c0b6b3dfec9823bf6523 I tried several OS memory, ES heap combinations, the biggest being 48GiB for the operating system and 32GiB for ES heap and it still fails with that. Any idea or link to an open issue I could follow ? Regards, Thomas. 1. debug output: [2015-01-14 12:01:55,740][DEBUG][indices.cluster ] [Saint Elmo] [mailspool][0] creating shard [2015-01-14 12:01:55,741][DEBUG][index.service] [Saint Elmo] [mailspool] creating shard_id [0] [2015-01-14 12:01:56,041][DEBUG][index.deletionpolicy ] [Saint Elmo] [mailspool][0] Using [keep_only_last] deletion policy [2015-01-14 12:01:56,041][DEBUG][index.merge.policy ] [Saint Elmo] [mailspool][0] using [tiered] merge mergePolicy with expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_on\ ce[10], max_merge_at_once_explicit[30], max_merged_segment[5gb], segments_per_tier[10.0], reclaim_deletes_weight[2.0] [2015-01-14 12:01:56,041][DEBUG][index.merge.scheduler] [Saint Elmo] [mailspool][0] using [concurrent] merge scheduler with max_thread_count[2], max_merge_count[4] [2015-01-14 12:01:56,042][DEBUG][index.shard.service ] [Saint Elmo] [mailspool][0] state: [CREATED] [2015-01-14 12:01:56,043][DEBUG][index.translog ] [Saint Elmo] [mailspool][0] interval [5s], flush_threshold_ops [2147483647], flush_threshold_size [200mb], flush_threshold_period [3\ 0m] [2015-01-14 12:01:56,044][DEBUG][index.shard.service ] [Saint Elmo] [mailspool][0] state: [CREATED]-[RECOVERING], reason [from gateway] [2015-01-14 12:01:56,044][DEBUG][index.gateway] [Saint Elmo] [mailspool][0] starting recovery from local ... [2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] processing [reroute_rivers_node_changed]: execute [2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] processing [reroute_rivers_node_changed]: no change in cluster_state [2015-01-14 12:01:56,048][DEBUG][cluster.service ] [Saint Elmo] processing [shard-failed ([mailspool][3], node[gOgAuHo4SXyfyuPpws0Usw], [P], s[INITIALIZING]), reason [engine failure, \ message [refresh failed][OutOfMemoryError[Java heap space: done applying updated cluster_state (version: 4) [2015-01-14 12:01:56,062][DEBUG][index.engine.internal] [Saint Elmo] [mailspool][0] starting engine [2015-01-14 12:02:19,701][WARN ][index.engine.internal] [Saint Elmo] [mailspool][0] failed engine [refresh failed] java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:187) at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104) at org.elasticsearch.index.cache.filter.weighted.WeightedFilterCache$FilterCacheFilterWrapper.getDocIdSet(WeightedFilterCache.java:177) at org.elasticsearch.common.lucene.search.OrFilter.getDocIdSet(OrFilter.java:55) at org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46) at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:130) at org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542) at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136) at org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59) at org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:554) at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:287) at org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:267) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:257) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171) at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118
RepositoryMissingException when restoring into a new cluster
I'm using the snapshot/restore feature of Elasticsearch, together with the Azure plugin to backup snapshots to Azure blob storage. Everything works when doing snapshots from a cluster and restoring to the same cluster. Now I'm in a situation where I want to restore an entirely new cluster (let's call that cluster b) from a snapshot generated from cluster a. When I run a restore request on cluster b, I get a 404. Doing a _status call on the snapshot, I get the same error: {error:RepositoryMissingException[[elasticsearch_logs] missing],status:404} The new cluster is configured with the Azure plugin and the same settings for Azure. I guess the error is caused by the fact, that Elasticsearch generates some metadata about the snapshots and store them locally in a the _snapshot index and this index is not on the new cluster. The same error is happening if I delete the data dir on cluster a and try to restore cluster a from a snapshot. How would I deal with a situation like this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5a0147c-cc38-4947-8530-0c66eb00fc2a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: RepositoryMissingException when restoring into a new cluster
That was exactly what I was missing. I didn't create the repository named elasticsearch_logs on cluster B. After I created it, the backup runs smoothly. Thanks, David! On Wednesday, January 7, 2015 8:31:08 AM UTC+1, David Pilato wrote: Did you create the repository on cluster B? How? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 7 janv. 2015 à 08:19, Thomas Ardal thoma...@gmail.com javascript: a écrit : I'm using the snapshot/restore feature of Elasticsearch, together with the Azure plugin to backup snapshots to Azure blob storage. Everything works when doing snapshots from a cluster and restoring to the same cluster. Now I'm in a situation where I want to restore an entirely new cluster (let's call that cluster b) from a snapshot generated from cluster a. When I run a restore request on cluster b, I get a 404. Doing a _status call on the snapshot, I get the same error: {error:RepositoryMissingException[[elasticsearch_logs] missing],status:404} The new cluster is configured with the Azure plugin and the same settings for Azure. I guess the error is caused by the fact, that Elasticsearch generates some metadata about the snapshots and store them locally in a the _snapshot index and this index is not on the new cluster. The same error is happening if I delete the data dir on cluster a and try to restore cluster a from a snapshot. How would I deal with a situation like this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5a0147c-cc38-4947-8530-0c66eb00fc2a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b5a0147c-cc38-4947-8530-0c66eb00fc2a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b1da7ff-9dd5-49ab-bd1f-34b38b1e0645%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Frontend webapp and boilerplate query code
Yes, Let's say that you wan to represent a pie graph with some short of aggregated data in it extracted from elasticsearch. Instead of writing the query in javascript or having it client side in the code we need something like a simple call of a get API for instance getMyPieData() from another service and get the data to represent that information. If we let elasticsearch to do that we may want to use search templates: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-template.html#pre-registered-templates if we want to as well cache that query we may use elasticsearch query cache: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-shard-query-cache.html#index-modules-shard-query-cache That is what elasticsearch more or less provides for my use case afaik. My question is if there is something else that I can do to cache and hide my queries apart from the over-mentioned solutions which make me bound to elasticsearch(some separate module, technology etc. based on best practises), while in another case I might want to change elasticsearch with something else and still not affect my frontend code, or more important not to constantly hit elasticsearch for those data. I'm trying to first verify that I will not reinvent the wheel and build my own solution Thank you again On Friday, 2 January 2015 17:04:59 UTC+2, Thomas wrote: Hi, I wish everybody a happy new year, all the best for 2015, and in continuation of the great success of ES, In our project we intend to create a simple webapp that will query elasticsearch for insights. We do not want to directly query elasticsearch for two reasons: - security - avoid boilerplate query code and to be able to decouple it What is the best way to achieve that? We are currently evaluating building the frontend in python/django project. has anyone faced similiar task and is it possible to share some thoughts? In other situations NGinX was a solution for security, but for avoiding having all the boilerplate query code in client side (e.g. in javascript) what is the most well established way? Finally, there are cases were some caching may be needed to avoid hitting elasticsearch constantly for the same data, how is this tackled? We need to build our own module to do all these? thank you in advance Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ba94df5-7f54-4ec3-87f5-7a038dd15ca7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch Frontend webapp and boilerplate query code
Hi, I wish everybody a happy new year, all the best for 2015, and in continuation of the great success of ES, In our project we intend to create a simple webapp that will query elasticsearch for insights. We do not want to directly query elasticsearch for two reasons: - security - avoid boilerplate query code and to be able to decouple it What is the best way to achieve that? We are currently evaluating building the frontend in python/django project. has anyone faced similiar task and is it possible to share some thoughts? In other situations NGinX was a solution for security, but for avoiding having all the boilerplate query code in client side (e.g. in javascript) what is the most well established way? Finally, there are cases were some caching may be needed to avoid hitting elasticsearch constantly for the same data, how is this tackled? We need to build our own module to do all these? thank you in advance Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7bdce45e-eff8-4f04-9a9b-f3256a9b28a9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: using a nested object field in a multi_match query
On Wednesday, December 10, 2014 4:33:12 PM UTC-3, thomas@beatport.com wrote: On Monday, August 11, 2014 1:29:56 PM UTC-4, Mike Topper wrote: Hello, I'm having trouble coming up with how to supply a field within a nested object in the multi_match fields list. I'm using the multi_match query in order to perform query time field boosting, but something like: query: { multi_match: { query: China Mieville, operator: and, fields: [ _all, title^2, author.name^1.5 ] } } doesn't seem to work. the title is boosted fine but in fact if i take out the _all field then i can see that author.name is never being used. is there a way to supply nested fields within a multi_match query? I've just been bit by this too. Anyone know how to make this work? In our case we switched the mapping type from nested to object and then this worked. I'm aware of the implications of this switch. We don't need the features provided by nested. Others may, of course. Thanks. -Tom -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/857a9674-4661-4730-9ec8-79ba3426a603%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: using a nested object field in a multi_match query
On Monday, August 11, 2014 1:29:56 PM UTC-4, Mike Topper wrote: Hello, I'm having trouble coming up with how to supply a field within a nested object in the multi_match fields list. I'm using the multi_match query in order to perform query time field boosting, but something like: query: { multi_match: { query: China Mieville, operator: and, fields: [ _all, title^2, author.name^1.5 ] } } doesn't seem to work. the title is boosted fine but in fact if i take out the _all field then i can see that author.name is never being used. is there a way to supply nested fields within a multi_match query? I've just been bit by this too. Anyone know how to make this work? Thanks. -Tom -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79272696-3745-4ce7-93e3-44d5b4cdd75e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Performance issue while indexing lot of documents
On Thu, Nov 6, 2014 at 11:09 AM, Moshe Recanati re.mo...@gmail.com wrote: // bulkRequest = client.prepareBulk(); Please fix your code to clearly only send 1000 in a bulk request. Looks like you are just increasnig the size of the bulk request now and executing it over and over -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABY_-Z7zNQJAqMdpry2hvuyK80UY-XPFHRw1YR23SKcrrZ%2B6%3DQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Modify the index setting after the index created ? what's the function of search_quote_analyzer ?
Bump, I'm having the same problem. On Thursday, June 12, 2014 10:32:14 PM UTC-5, Ivan Ji wrote: Hi all, I want to modify one field's search analyzer from standard to keyword after the index created. So I try to PUT mapping : $ curl -XPUT 'http://localhost:9200/qindex/main/_mapping' -d ' { main : { properties : { name : { type: string, index: analyzed, index_analyzer: filename_ngram, search_analyzer: keyword} } } } ' The operation seems succeed. Because I expect it might conflict, what would the situations that conflict might occur? This is my first question. Anyway then I try to get the mapping out: (partial) name: { type: string, index_analyzer: filename_ngram, search_analyzer: keyword, include_in_all: true, search_quote_analyzer: standard } So I am wondering whether my operation succeeded? and what is the search_quote_analyzer function? And it still remains standard, does it matter? Could anyone answer me these questions? Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d3ec9347-931e-43bf-a199-d667a43f42a8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Floating point precision in response
Hi, I have a quick question with regards the response of numeric values. I perform an aggregation with the sum aggregation and when I get back the response in a curl request the number is shown as follows: aggs:{ day_clicks:{ sum: { field : clicks } } } response ... doc_count: 384, day_clicks: { value: *2.7372883E7*}, If you noticed the *E7* which is the floating point instead of just printing the actual number: ... value: *27372883**...* Has anyone faced a similar case? At what level is this happening? in elastiscearch's response or later. I have noticed in marvel/sense that the response is coming in this way and the transformation is happening client side. Is there a way to change that in the response of ES? Thank you very much Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e99a0e58-ea22-49a4-974e-8796c7941c74%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using ES as a primary datastore.
Hi, You have to calculate the volumes you will keep in one shard first then you have to break your volumes into the number of shards you will maintain and then scale accordingly into a number of nodes, or at least as your volumes grow you should grow your cluster as well. It is difficult to predict what problems may arise it is too generic your case, what will be the usage of the cluster? what queries you will perform, you will mostly do indexing and occasionally querying or you will intensively query your data. Most important you need to think how you will partition your data, will you have one index, multiple index like a logstash approach? or not Maybe check here: https://www.found.no/foundation/sizing-elasticsearch/ For data more than a year what you will do delete them? Do you afford to lose data? Will you keep backups? IMHO, these are some of the questions you must answer in order to see whether such an approach suit your needs. It is hardware, structure and partitioning of your data. Thomas On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote: Hello, We are planning to use ES as a primary datastore. Here is my usecase We receive a million transactions per day (all are inserts). Each transaction is around 500KB size, transaction has 10 fields we should be able to search on all 10 fields. We want to keep around 1 yr worth of data, this comes around 180TB Can you please let me know any problems that might arise if i use elastic search as the primary datastore. Regards, Suman -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch script execution on missing field
I think the correct way to see if there is a missing field is the following doc['countryid'].empty == true Check also: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields btw why such an old version of ES? Thomas On Wednesday, 17 September 2014 13:53:08 UTC+3, Manoj wrote: I am currently using ES version 0.19. For a feature requirement, I wanted to execute script on missing field through terms facet. The curl which I tried is something like below code { query: { term: { content: deep } }, filter: { and: { filters: [ { type: { value: twitter } } ] } }, facets: { loca: { terms: { field: countryid, script: doc['countryid']==null?1:doc['countryid'].value } } } } /code I assume that missing fields can be accessed by the condition doc['countryid']==null. But it looks like this is not the way to identify the missing field in script :-( For which am always receiving response as missing code{ took : 1, timed_out : false, _shards : { total : 6, successful : 6, failed : 0 }, hits : { total : 0, max_score : null, hits : [ ] }, facets : { loca : { _type : terms, missing : 1, total : 0, other : 0, terms : [ ] } } }/code Could anybody help me to get this correct. Thanks in advance, Manoj -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5a1acee-05c1-412d-b10a-d4235cd0b628%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Indexing is becoming slow, what to look for?
By setting this parameter, some additional questions of mine have been generated: By setting indices.memory.index_buffer_size to a specific node and not to all nodes of the cluster, will this configuration be taken into account from all nodes? Is it going to be cluster wide or only for index operations of the specific node? So do I need to set this up to all nodes one by one, do a restart and then see the effects? Finally, if we index data into an index of 10 shards and I have 5 nodes, that means that the particular node will index to 2 shards, so the indices.memory.index_buffer_size will refer to those specific two shards? Thank you very much Thomas On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote: Hi, I have been performing indexing operations in my elasticsearch cluster for some time now. Suddenly, I have been facing some latency while indexing and I'm trying to find the reason for it. Details: I have a custom process which is uploading every interval a number of logs with bulk API. This process was taking about 5-7 minutes every time. For some reason, the last days I noticed that the exact same procedure, same volumes, takes about 15-20 minutes. While manipulating the data I run update operations through scripting (groovy). My cluster is a set of 5 nodes, my first impression was that I need to scale therefore I added an extra node. The problem seemed that it was solved but after a day again I face the same issue. Is it possible to give some ideas about what to check, or what seems to be the issue? How is possible to check if a background process is running or creating any issues (expunge etc.)? Does anyone has any similar problems? Any help appreciated, let me know what info to share ES version is 1.3.1 JDK is 1.7 Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7e8e0bb-aa71-4210-97db-6e8cd46cd79c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Indexing is becoming slow, what to look for?
Hi, I have been performing indexing operations in my elasticsearch cluster for some time now. Suddenly, I have been facing some latency while indexing and I'm trying to find the reason for it. Details: I have a custom process which is uploading every interval a number of logs with bulk API. This process was taking about 5-7 minutes every time. For some reason, the last days I noticed that the exact same procedure, same volumes, takes about 15-20 minutes. While manipulating the data I run update operations through scripting (groovy). My cluster is a set of 5 nodes, my first impression was that I need to scale therefore I added an extra node. The problem seemed that it was solved but after a day again I face the same issue. Is it possible to give some ideas about what to check, or what seems to be the issue? How is possible to check if a background process is running or creating any issues (expunge etc.)? Does anyone has any similar problems? Any help appreciated, let me know what info to share ES version is 1.3.1 JDK is 1.7 Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fed6d402-d4bb-44ea-8de7-d66c2ec5cb91%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: aggregations
What version of es have you been using, afaik in later versions you can control the percentage of heap space to utilize with update settings api, try to increase it a bit and see what happens, default is 60%, increase it for example to 70%: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#fielddata-circuit-breaker T. On Wednesday, 3 September 2014 19:58:02 UTC+3, navdeep agarwal wrote: hi , i am bit new Elastic search ,while testing on elasticsearch's aggregation feature ,i am always hitting data too large,i understand that aggregations are very memory intensive , so is there any way query in ES where one query's output can be ingested to aggregation so that number of input to aggregation is limited . i have used filter and querying before aggregations . i have around 60 GB index on 5 shards . queries i tried: GET **/_search { query: {term: { file_sha2: { value: } }}, aggs: { top_filename: { max: { field: portalid } } } } --- GET /_search { aggs: { top filename: { filter: {term: { file_sha2: xx }}, aggs: { top_filename: { max: { field: portalid } } } } } } thanks in advance . -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ca4244c-972e-4adf-bb1d-1ef2134fcdd7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Indexing is becoming slow, what to look for?
Thx Michael, I will read the post in detail and let you know for any findings Thomas. On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote: Hi, I have been performing indexing operations in my elasticsearch cluster for some time now. Suddenly, I have been facing some latency while indexing and I'm trying to find the reason for it. Details: I have a custom process which is uploading every interval a number of logs with bulk API. This process was taking about 5-7 minutes every time. For some reason, the last days I noticed that the exact same procedure, same volumes, takes about 15-20 minutes. While manipulating the data I run update operations through scripting (groovy). My cluster is a set of 5 nodes, my first impression was that I need to scale therefore I added an extra node. The problem seemed that it was solved but after a day again I face the same issue. Is it possible to give some ideas about what to check, or what seems to be the issue? How is possible to check if a background process is running or creating any issues (expunge etc.)? Does anyone has any similar problems? Any help appreciated, let me know what info to share ES version is 1.3.1 JDK is 1.7 Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/caf966f1-97a3-4b86-b900-dc36dcaa279e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Indexing is becoming slow, what to look for?
Hi, I wanted to clarify something from the blog post you mentioned. You specify that based on calculations we should give at most ~512 MB indexing buffer per active shard What i wanted to ask is what do we mean with the term active? Do you mean the primary only or not? Thank you again On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote: Hi, I have been performing indexing operations in my elasticsearch cluster for some time now. Suddenly, I have been facing some latency while indexing and I'm trying to find the reason for it. Details: I have a custom process which is uploading every interval a number of logs with bulk API. This process was taking about 5-7 minutes every time. For some reason, the last days I noticed that the exact same procedure, same volumes, takes about 15-20 minutes. While manipulating the data I run update operations through scripting (groovy). My cluster is a set of 5 nodes, my first impression was that I need to scale therefore I added an extra node. The problem seemed that it was solved but after a day again I face the same issue. Is it possible to give some ideas about what to check, or what seems to be the issue? How is possible to check if a background process is running or creating any issues (expunge etc.)? Does anyone has any similar problems? Any help appreciated, let me know what info to share ES version is 1.3.1 JDK is 1.7 Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/901847bc-4e02-465f-a11f-9896e31d0e7f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Indexing is becoming slow, what to look for?
Got it thanks On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote: Hi, I have been performing indexing operations in my elasticsearch cluster for some time now. Suddenly, I have been facing some latency while indexing and I'm trying to find the reason for it. Details: I have a custom process which is uploading every interval a number of logs with bulk API. This process was taking about 5-7 minutes every time. For some reason, the last days I noticed that the exact same procedure, same volumes, takes about 15-20 minutes. While manipulating the data I run update operations through scripting (groovy). My cluster is a set of 5 nodes, my first impression was that I need to scale therefore I added an extra node. The problem seemed that it was solved but after a day again I face the same issue. Is it possible to give some ideas about what to check, or what seems to be the issue? How is possible to check if a background process is running or creating any issues (expunge etc.)? Does anyone has any similar problems? Any help appreciated, let me know what info to share ES version is 1.3.1 JDK is 1.7 Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f10aa50-13e1-4dff-b44a-ac9ce4cbf2d6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Cloud-aws version for 1.3.1 of elasticsearch
Hi, I wanted to ask whether the version of cloud-aws plugin is 2.1.1 for elasticsearch 1.3.1, by looking at the github page: https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/es-1.3 How come the plugin version for 1.3.1 of elasticserach goes backwards? For elasticsearch 1.2.x the version of cloud-aws is 2.2.0. Is this correct? Thank you very much Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccb42a1f-a0f0-40ed-81d6-96b0e1b279c8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Integration testing a native script
Hi, I have tried the same approach and it worked for me, meaning to copy the script I want to perform an integration test and run my IT. I do the following steps 1) Setup the required paths for elasticsearch final Settings settings = settingsBuilder() .put(http.enabled, true) .put(path.conf, confDir) .put(path.data, dataDir) .put(path.work, workDir) .put(path.logs, logsDir) 2) copy your scripts to the appropriate location 3) fire up a local node node = nodeBuilder().local(true).settings(settings).clusterName(nodeName).node(); node.start(); Maybe you first start the node and then add the script, this might not work because i think es does a per minute scan for new scripts and the IT test does not allow this to happen, hence you should first copy your script and then start the node Hope it helps Thomas On Wednesday, 30 July 2014 12:31:06 UTC+3, Nick T wrote: Is there a way to have a native java script accessible in integration tests? In my integration tests I am creating a test node in the /tmp folder. I've tried copying the script to /tmp/plugins/scripts but that was quite hopeful and unfortunately does not work. Desperate for help. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b257e79-e424-4573-8f12-81e0a95b27b4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Integration testing a native script
I have noticed that you mention native java script so you have implemented it as a plugin? if so try the following in your settings: final Settings settings = settingsBuilder() ... .put(plugin.types, YourPlugin.class.getName()) Thomas On Wednesday, 30 July 2014 12:31:06 UTC+3, Nick T wrote: Is there a way to have a native java script accessible in integration tests? In my integration tests I am creating a test node in the /tmp folder. I've tried copying the script to /tmp/plugins/scripts but that was quite hopeful and unfortunately does not work. Desperate for help. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7baa9562-58be-4f16-9392-9cf07e4e989d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: 1.1.1 to 1.3 upgrade possible?
Thnx Mark, I can see that as you mentioned new version 1.3.1 has been released Thomas On Monday, 28 July 2014 11:11:57 UTC+3, Thomas wrote: Hi, I maintain a working cluster which is in version 1.1.1 and I'm planning to upgrade to version 1.3.0 which is released the previous week. I wanted to ask whether it is compatible to upgrade or whether I will have any known issues/problems, what to expect in general. Thank you very much Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d3cc8f0-b1c8-404f-be98-ff1133c6771d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
1.1.1 to 1.3 upgrade possible?
Hi, I maintain a working cluster which is in version 1.1.1 and I'm planning to upgrade to version 1.3.0 which is released the previous week. I wanted to ask whether it is compatible to upgrade or whether I will have any known issues/problems, what to expect in general. Thank you very much Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6c5d7e6-150e-4756-9532-9b9d0beee58e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: 1.1.1 to 1.3 upgrade possible?
Great, thanks 4 your reply Mark On Monday, 28 July 2014 11:11:57 UTC+3, Thomas wrote: Hi, I maintain a working cluster which is in version 1.1.1 and I'm planning to upgrade to version 1.3.0 which is released the previous week. I wanted to ask whether it is compatible to upgrade or whether I will have any known issues/problems, what to expect in general. Thank you very much Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/21478ac1-82dc-4945-98ce-e176abddf3b2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Aggregation on parent/child documents
Hi, I wanted to ask whether is possible to perform aggregations combining parent/child documents, something similar with the nested aggregation and the reverse nested aggregation. It would be very helpful to have the ability to create for instance buckets based on parent document fields and get back aggregations that contain fields of both parent and children documents combined. Any thoughts, future features to be added in the near releases, related to the above? Thank you Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91d60d52-c538-45b5-8cf0-91cb1e9d9a9a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation on parent/child documents
Hi Adrien and thank you for the reply, This is exactly what i had in mind alongside with the reversed search equivalent with the reverse_nested, this is planed for version 1.4.0 onwards as i see, will keep track of any updates on this, thanks Thomas On Friday, 25 July 2014 14:54:50 UTC+3, Thomas wrote: Hi, I wanted to ask whether is possible to perform aggregations combining parent/child documents, something similar with the nested aggregation and the reverse nested aggregation. It would be very helpful to have the ability to create for instance buckets based on parent document fields and get back aggregations that contain fields of both parent and children documents combined. Any thoughts, future features to be added in the near releases, related to the above? Thank you Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6c7dfa1-d8b1-4ce5-8046-73892f74b33e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch init script for centos or rhel ?
The one from the elasticsearch CentOS rpm repository works fine here on EL6. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html (there are also 1.0 and 1.1 repos, simply adjust the baseurl) The source is here: https://github.com/elasticsearch/elasticsearch/blob/master/src/rpm/init.d/elasticsearch ..but I recommend the rpm from the repo because of /etc/sysconfig and install locations etc, much easier that way. ~Tom Am 16.07.2014 09:12, schrieb Aesop Wolf: Did you ever find a script that works on CentOS? I'm also looking for one. On Friday, March 14, 2014 9:18:04 AM UTC-7, Dominic Nicholas wrote: Thanks. Does anyone know of a version that uses /etc/rc.d/init.d/functions instead of /lib/lsb, that would work on CentOS and work with elasticsearch 1.0.1 ? Dom On Friday, March 14, 2014 9:24:12 AM UTC-4, David Pilato wrote: May be this? https://github.com/elasticsearch/elasticsearch/blob/master/src/deb/init.d/elasticsearch https://github.com/elasticsearch/elasticsearch/blob/master/src/deb/init.d/elasticsearch -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 14 mars 2014 à 14:19, Dominic Nicholas dominic.s...@gmail.com a écrit : Hi - can someone please point me to an /etc/init.d script for elasticsearch 1.0.1 for CentOS or RHEL ? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9e8017c-a565-40a6-944b-8920a591f6d6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a9e8017c-a565-40a6-944b-8920a591f6d6%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53C62A62.3040100%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Setting id of document with elasticsearch-hadoop that is not in source document
I was just curious if there was a way of doing this without doing this, I can add the field if necessary. For alternatives, what if in addition to es.mapping.id, there is another property available also, like es.mapping.id.include.in.src where you could specify whether the src field actually gets included in the source document. In elasticsearch, you can create and update documents without having to include the id in the source document, so I think it would make sense to be able to do that with elasticsearch-hadoop also. On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote: You need to specify the id of the document you want to update somehow. Since in es-hadoop things are batch focused, each doc needs its own id specified somehow hence the use of 'es.mapping.id' to indicate its value. Is there a reason why this approach does not work for you - any alternatives that you thought of? Cheers, On 7/7/14 10:48 PM, Brian Thomas wrote: I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source document does not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id to update without having the add a new field to the MapWritable object? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript: mailto: elasticsearch+unsubscr...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77259ed3-a896-47cc-9304-cc32046756ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Setting id of document with elasticsearch-hadoop that is not in source document
I was just curious if there was a way of doing this without doing this, I can add the field if necessary. For alternatives, what if in addition to es.mapping.id, there is another property available also, like es.mapping.id.exlude that will not include the id field in the source document. In elasticsearch, you can create and update documents without having to include the id in the source document, so I think it would make sense to be able to do that with elasticsearch-hadoop also. On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote: You need to specify the id of the document you want to update somehow. Since in es-hadoop things are batch focused, each doc needs its own id specified somehow hence the use of 'es.mapping.id' to indicate its value. Is there a reason why this approach does not work for you - any alternatives that you thought of? Cheers, On 7/7/14 10:48 PM, Brian Thomas wrote: I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source document does not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id to update without having the add a new field to the MapWritable object? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript: mailto: elasticsearch+unsubscr...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark
Here is the gradle build I was using originally: apply plugin: 'java' apply plugin: 'eclipse' sourceCompatibility = 1.7 version = '0.0.1' group = 'com.spark.testing' repositories { mavenCentral() } dependencies { compile 'org.apache.spark:spark-core_2.10:1.0.0' compile 'edu.stanford.nlp:stanford-corenlp:3.3.1' compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: '3.3.1', classifier:'models' compile files('lib/elasticsearch-hadoop-2.0.0.jar') testCompile 'junit:junit:4.+' testCompile group: com.github.tlrx, name: elasticsearch-test, version: 1.2.1 } When I ran dependencyInsight on jackson, I got the following output: C:\dev\workspace\SparkProjectgradle dependencyInsight --dependency jackson-core :dependencyInsight com.fasterxml.jackson.core:jackson-core:2.3.0 \--- com.fasterxml.jackson.core:jackson-databind:2.3.0 +--- org.json4s:json4s-jackson_2.10:3.2.6 |\--- org.apache.spark:spark-core_2.10:1.0.0 | \--- compile \--- com.codahale.metrics:metrics-json:3.0.0 \--- org.apache.spark:spark-core_2.10:1.0.0 (*) org.codehaus.jackson:jackson-core-asl:1.0.1 \--- org.codehaus.jackson:jackson-mapper-asl:1.0.1 \--- org.apache.hadoop:hadoop-core:1.0.4 \--- org.apache.hadoop:hadoop-client:1.0.4 \--- org.apache.spark:spark-core_2.10:1.0.0 \--- compile Version 1.0.1 of jackson-core-asl does not have the field ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do. On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote: Hi, Glad to see you sorted out the problem. Out of curiosity what version of jackson were you using and what was pulling it in? Can you share you maven pom/gradle build? On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas brianjt...@gmail.com javascript: wrote: I figured it out, dependency issue in my classpath. Maven was pulling down a very old version of the jackson jar. I added the following line to my dependencies and the error went away: compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13' On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote: I am trying to test querying elasticsearch using Apache Spark using elasticsearch-hadoop. I am just trying to do a query to the elasticsearch server and return the count of results. Below is my test class using the Java API: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.serializer.KryoSerializer; import org.elasticsearch.hadoop.mr.EsInputFormat; import scala.Tuple2; public class ElasticsearchSparkQuery{ public static int query(String masterUrl, String elasticsearchHostPort) { SparkConf sparkConfig = new SparkConf().setAppName( ESQuery).setMaster(masterUrl); sparkConfig.set(spark.serializer, KryoSerializer.class.getName()); JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig); Configuration conf = new Configuration(); conf.setBoolean(mapred.map.tasks.speculative.execution, false); conf.setBoolean(mapred.reduce.tasks.speculative.execution, false); conf.set(es.nodes, elasticsearchHostPort); conf.set(es.resource, media/docs); conf.set(es.query, ?q=*); JavaPairRDDText, MapWritable esRDD = sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class); return (int) esRDD.count(); } } When I try to run this I get the following error: 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0] 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id... 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES at org.elasticsearch.hadoop.serialization.json. JacksonJsonParser.clinit(JacksonJsonParser.java:38) at org.elasticsearch.hadoop.serialization.ScrollReader. read(ScrollReader.java:75) at org.elasticsearch.hadoop.rest.RestRepository.scroll( RestRepository.java:267) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext( ScrollQuery.java:75) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next( EsInputFormat.java:319) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader. nextKeyValue(EsInputFormat.java:255) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext( NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext( InterruptibleIterator.scala:39) at org.apache.spark.util.Utils$.getIteratorSize
Setting id of document with elasticsearch-hadoop that is not in source document
I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source document does not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id to update without having the add a new field to the MapWritable object? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark
I figured it out, dependency issue in my classpath. Maven was pulling down a very old version of the jackson jar. I added the following line to my dependencies and the error went away: compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13' On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote: I am trying to test querying elasticsearch using Apache Spark using elasticsearch-hadoop. I am just trying to do a query to the elasticsearch server and return the count of results. Below is my test class using the Java API: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.serializer.KryoSerializer; import org.elasticsearch.hadoop.mr.EsInputFormat; import scala.Tuple2; public class ElasticsearchSparkQuery{ public static int query(String masterUrl, String elasticsearchHostPort) { SparkConf sparkConfig = new SparkConf().setAppName(ESQuery).setMaster(masterUrl); sparkConfig.set(spark.serializer, KryoSerializer.class.getName()); JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig); Configuration conf = new Configuration(); conf.setBoolean(mapred.map.tasks.speculative.execution, false); conf.setBoolean(mapred.reduce.tasks.speculative.execution, false); conf.set(es.nodes, elasticsearchHostPort); conf.set(es.resource, media/docs); conf.set(es.query, ?q=*); JavaPairRDDText, MapWritable esRDD = sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class); return (int) esRDD.count(); } } When I try to run this I get the following error: 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0] 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id... 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38) at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75) at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Has anyone run into this issue with the JacksonJsonParser? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c2b2f2e-5196-4a72-bfbc-4cd0fda9edf0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark
I am trying to test querying elasticsearch using Apache Spark using elasticsearch-hadoop. I am just trying to do a query to the elasticsearch server and return the count of results. Below is my test class using the Java API: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.serializer.KryoSerializer; import org.elasticsearch.hadoop.mr.EsInputFormat; import scala.Tuple2; public class ElasticsearchSparkQuery{ public static int query(String masterUrl, String elasticsearchHostPort) { SparkConf sparkConfig = new SparkConf().setAppName(ESQuery).setMaster(masterUrl); sparkConfig.set(spark.serializer, KryoSerializer.class.getName()); JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig); Configuration conf = new Configuration(); conf.setBoolean(mapred.map.tasks.speculative.execution, false); conf.setBoolean(mapred.reduce.tasks.speculative.execution, false); conf.set(es.nodes, elasticsearchHostPort); conf.set(es.resource, media/docs); conf.set(es.query, ?q=*); JavaPairRDDText, MapWritable esRDD = sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class); return (int) esRDD.count(); } } When I try to run this I get the following error: 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0] 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id... 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38) at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75) at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Has anyone run into this issue with the JacksonJsonParser? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9da5ae25-3e57-4c24-ab45-c62c987ebec0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Proper parsing of String values like 1m, 1q HOUR etc.
Hi, Thanks again for your time, what I'm trying to do is to be able to generate for example the time in milliseconds in the same way the elasticsearch core does when for example you passes into the date histogram the 1q configuration. I'm trying to simulate in a way the interval of date histogram without using the date histogram, if possible. What is the one liner of code (if you can say that) that does this transformation of 1q into milliseconds and elasticsearch is able to give the intervals of date histogram, because then i compare time intervals with date histogram and I want to be exactly the same. And please allow me to make one more question, since elasticsearch uses joda start of week is considered always Monday? independently of the timezone? Thanks!! Thomas On Tuesday, 17 June 2014 18:31:37 UTC+3, Thomas wrote: Hi, I was wondering whether there is a proper Utility class to parse the given values and get the duration in milliseconds probably for values such as 1m (which means 1 minute) 1q (which means 1 quarter) etc. I have found that elasticsearch utilizes class TimeValue but it only parses up to week, and values such as WEEK, HOUR are not accepted. So is in elasticsearch source any utility class that does the job ? (for Histograms, ranges wherever is needed) Thank you Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bbff2784-c06d-4944-9887-0147e9e31a5e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Proper parsing of String values like 1m, 1q HOUR etc.
Hi Brian, Thanks for your reply, I understand your point but if you check the source code of TimeValue it does not support the quarter and the year so I was wondering what is the class that supports the transformation of the string 1q into millisecods or 1y into millisecods if any Thanks On Tuesday, 17 June 2014 18:31:37 UTC+3, Thomas wrote: Hi, I was wondering whether there is a proper Utility class to parse the given values and get the duration in milliseconds probably for values such as 1m (which means 1 minute) 1q (which means 1 quarter) etc. I have found that elasticsearch utilizes class TimeValue but it only parses up to week, and values such as WEEK, HOUR are not accepted. So is in elasticsearch source any utility class that does the job ? (for Histograms, ranges wherever is needed) Thank you Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9666f856-7327-4e97-8185-de603f02aee6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Aggregation Framework, possible to get distribution of requests per user
Hi, I wanted to ask whether it is possible to get with the aggregation framework the distribution of one specific type of documents sent per user, I'm interested for occurrences of documents per user, e.g. : 1000 users sent 1 document 500 ussers sent 2 documents X number of unique users sent Y documents (each) etc. on each document i index the user_id Is there a way to support such a query, or partially support it? get the first 10 rows of this type of list not the exhaustive list. Can you give me some hint? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9e7e543-372c-4441-9cac-e7c0f259ed4e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation Framework, possible to get distribution of requests per user
Hi David Thank you for your reply, so based on your suggestion I should maintain a document (e.g. user) with some aggregated values and I should update it as we move along with our indexing of our data, correct? This though would only give me totals. I cannot apply something like a range. I found as well a similar discussion here https://groups.google.com/forum/#!msg/elasticsearch/UsrCG2Abj-A/IDO9DX_PoQwJ. Maybe something similar with the terms and histogram aggregation could support this logic like instead of giving : { aggs : { requests_distribution : { distribution : { field : user_id, interval : 50 } } } } and the result could be: { aggregations: { requests_distribution : { buckets: [ { key: 0, doc_count: 2 }, { key: 50, doc_count: 400 }, { key: 150, doc_count: 30 } ] } } } Where the key represents a unique number of users like for 0 to 50 users have 2 documents per user etc. Just an idea Thanks Thomas On Tuesday, 24 June 2014 13:32:13 UTC+3, Thomas wrote: Hi, I wanted to ask whether it is possible to get with the aggregation framework the distribution of one specific type of documents sent per user, I'm interested for occurrences of documents per user, e.g. : 1000 users sent 1 document 500 ussers sent 2 documents X number of unique users sent Y documents (each) etc. on each document i index the user_id Is there a way to support such a query, or partially support it? get the first 10 rows of this type of list not the exhaustive list. Can you give me some hint? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae8b56f1-a783-4ade-b948-079f6457ae27%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation Framework, possible to get distribution of requests per user
My mistake sorry, Here is an example: I have the request document: request:{ dynamic : strict, properties : { time : { format : dateOptionalTime, type : date }, user_id : { index : not_analyzed, type : string }, country : { index : not_analyzed, type : string } } } I want to find the number of (unique) user_ids that have X number of documents, e.g. for country US, and ideally I need the full list e.g.: 1000 users have 43 documents .. 100 users have 234 documents 150 users have 500 documents etc.. In other words the distribution of documents (requests) per unique user count, of course I can understand that it is a pretty heavy operation in terms of memory, but we may limit to the top 100 rows for instance, or if we can workaround it. Thanks again for your time Thomas On Tuesday, 24 June 2014 13:32:13 UTC+3, Thomas wrote: Hi, I wanted to ask whether it is possible to get with the aggregation framework the distribution of one specific type of documents sent per user, I'm interested for occurrences of documents per user, e.g. : 1000 users sent 1 document 500 ussers sent 2 documents X number of unique users sent Y documents (each) etc. on each document i index the user_id Is there a way to support such a query, or partially support it? get the first 10 rows of this type of list not the exhaustive list. Can you give me some hint? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e07561ed-7f1b-4e98-8a8d-16e410324cc2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Splunk vs. Elastic search performance?
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk. Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom: That's a lot of data! I don't know of any installations that big but someone else might. What sort of infrastructure are you running splunk on now, what's your current and expected retention? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com javascript: wrote: We have a large Splunk instance. We load about 1.25 Tb of logs a day. We have about 1,300 loaders (servers that collect and load logs - they may do other things too). As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide? Should I expect to run on very similar hardware? More? or Less? Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start. Are there any white papers or other documents about switching? It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired the former VP of Products at Splunk, Gaurav Gupta - but there were few numbers in that article either). Thanks, Frank -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code.* What seems to be initially solved in my local machine, when I got back to the cluster, nothing has changed. Again my app performs really really poor. I get more than 10 seconds to perform a search with ~10 sub-aggregations. What seems strange is that I notice that the cluster is pretty ok with regards load average, CPU etc. Any hints on where to look for solving this out? to be able to identify the bottleneck *Ask for any additional information to provide*, I didn't want to make this post too long to read Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
Below is an example aggregation i perform, is there any optimizations I can perform? Maybe disabling some features i do not need etc. curl -XPOST http://localhost:9200/logs-idx.20140613/event/_search?search_type=count; -d ' { aggs: { f1: { filter: { or: [ { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } } ] } } } }, { range: { event_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] }, { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } }, { range: { request_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] } } } }, { range: { event_time: { lt: 2014-06-13T10:00:00 } } } ] } ] }, aggs: { per_interval: { date_histogram: { field: event_time, interval: minute }, aggs: { metrics: { terms: { field: event, size: 10 } } } } } } } }' On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote: Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code.* What seems to be initially solved in my local machine, when I got back to the cluster, nothing has changed. Again my app performs really really poor. I get more than 10 seconds to perform a search with ~10 sub-aggregations
Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
So I restructured my curl as follows, is this what you mean?, by doing some first hits i do get some slight improvement, but need to check into production data: Thank you will try it and come back with results curl -XPOST http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count; -d' { query: { filtered: { filter: { or: [ { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } } ] } } } }, { range: { event_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] }, { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } }, { range: { request_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] } } } }, { range: { event_time: { lt: 2014-06-13T10:00:00 } } } ] } ] } } }, aggs: { per_interval: { date_histogram: { field: event_time, interval: minute }, aggs: { metrics: { terms: { field: event, size: 12 } } } } } }' On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote: Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code.* What seems to be initially solved in my local machine, when I got back to the cluster, nothing has changed. Again my app performs really really poor. I get more than 10 seconds to perform a search with ~10
Re: Indexing nonstandard geo_point field.
I looked at the documenation for elasticsearch's geo_shape and it looks like that use [longitude, latitude] Found this node on the geo_shape documentation page http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-shape-type.html Note: In GeoJSON, and therefore Elasticsearch, the correct*coordinate order is longitude, latitude (X, Y)* within coordinate arrays. This differs from many Geospatial APIs (e.g., Google Maps) that generally use the colloquial latitude, longitude (Y, X). An alternative I found was to use the computed fields plugin https://github.com/SkillPages/elasticsearch-computed-fields and create a mapping like this: @coordinates-str : { type : computed, script : _source.geo.coordinates[0] + ',' + _source.geo.coordinates[1], result : { type : geo_point, store : true } } This seems to create the string in the correct format for the geo point. The issue I am having with this method right now is that Elasticsearch will return an error if the source document does not have the geo.coordinates field. On Sunday, June 1, 2014 4:28:24 PM UTC-4, Alexander Reelsen wrote: Hey, you could index this as a geo shape (as this is valid GeoJSON). If you really need the functionality for a geo_point, you need to change the structure of the data. --Alex On Sat, May 31, 2014 at 3:36 PM, Brian Thomas mynam...@gmail.com javascript: wrote: I am new to Elasticsearch and I am trying to index a json document with a nonstandard lat/long format. I know the standard format for a geo_point array is [lon, lat], but the documents I am indexing has format [lat, lon]. This is what the JSON element looks like: geo: { type: Point, coordinates: [ 38.673459, -77.336781 ] } Is there anyway I could have elasticsearch reorder this array or convert this field to a string without having to modify the source document prior to indexing? Could this be done using a field mapping or script in elasticsearch? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e688310-5777-4906-889e-cd77693c3908%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/0e688310-5777-4906-889e-cd77693c3908%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/48b0ae24-aaa8-4a05-9690-23032974da31%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Indexing nonstandard geo_point field.
I am new to Elasticsearch and I am trying to index a json document with a nonstandard lat/long format. I know the standard format for a geo_point array is [lon, lat], but the documents I am indexing has format [lat, lon]. This is what the JSON element looks like: geo: { type: Point, coordinates: [ 38.673459, -77.336781 ] } Is there anyway I could have elasticsearch reorder this array or convert this field to a string without having to modify the source document prior to indexing? Could this be done using a field mapping or script in elasticsearch? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e688310-5777-4906-889e-cd77693c3908%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Question about week granularity elasticsearch uses
Hello, I stepped into a situation where I need to truncate a timestamp field truncated to the week, and i want to do it the exact way elasticsearch does it in the datehistogram aggregation in order to be able to perform comparisons. Does anyone knows how I should perform the truncate to the week? I notice that datehistogram returns the beginning of the week (MONDAY), is it safe to use the Calendar way as follows? Calendar cal = Calendar.getInstance(); cal.setTimeZone(TimeZone.getTimeZone(GMT)); // cal.set(Calendar.DAY_OF_WEEK, cal.getFirstDayOfWeek()); cal.set(Calendar.DAY_OF_WEEK, Calendar.MONDAY); Date time = cal.getTime(); System.out.println(time = + time); Is the first day of week depends on the Locale in elasticsearch or not? Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d763e87-d1ab-491f-af7e-8a0b4ba71d39%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Clear cache on demand and Circuit breaker Problem
Hi, I'm trying to get some aggregated information by Querying Elasticsearch via my app. What I notice is that after some time I get a CircuitBreaker exception and my query fails. I can assume that I load too many fielddata and eventually the CircuitBreaker stops my query. Inside my application I have a logic where I do the query sequentially by time. For example I split the query with a range if two hours to every half hour, hence instead of doing one query for the full two hours period I do 4 queries of half hour each. And this is something I can configure. My Question is whether it has a meaning to perform a clearCache request between my requests (or every 15 minutes for instance) in order to avoid CircuitBreaker exception. I know I will make it slower but to my mind it is better to perform a bit poorly rather than stopping the operation. Knowing that the query remains the same (with different parameters) does this have a meaning ? or I will end up deleting and creating the same cache again and again? client.admin().indices().prepareClearCache(indexName).get(); Other alternatives here to avoid circuitbreaker in the most efficient way? Of course if I leave it unbounded I eventually get a heap space exception.. Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d093c6df-b40b-4704-b0dd-c6bc300299c3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Scripts reload on demand
Hi, I was wondering whether there is a way to reload the scripts on demand provided under config/scripts. I'm facing a weird situation were although the documentation describes that the scripts are loaded every xx amount of time (configuration) I do not see that happening and there is no way to see a new script I put unless I restart my node(s). Is there a curl request to be able to force reload the scripts? Additionally, is there any curl command that can display which scripts are loaded into ES Node and which are not? I use elasticsearch 1.1.1 and my scripts are in Groovy (with groovy lang plugin installed) Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51e0da62-8934-4e67-9fb8-792353f532da%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
access parent bucket's key from child aggregation in geohash grid
Hello! I have been progressing well with aggregations, but this one has got me stumped. I'm trying to figure out how to access the key of the parent bucket from a child aggregation. The parent bucket is geohash_grid, and the child aggregation is avg (trying to get avg lat and lon, but only for points that match the parent's bucket's geohash key) Something like this: aggregations : { LocationsGrid: { geohash_grid : { field : Locations, precision : 7, }, aggregations : { avg_lat: { avg: { script: if (doc['Locations'].value.geohash.startsWith(*parent_bucket.key*)) doc['Locations'].value.lat; } } }, } } Thanks for any help or ideas with this! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/624d0bdd-c380-4c72-b642-e6afff3458a9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Upgrade cluster from 0.90.11 to 1.1.1
I'm running a two-node cluster with Elasticsearch 0.90.11. I want to upgrade to the newest version (1.1.1), but I'm not entirely sure on how to do it. 0.90.11 is based on Lucene 4.6.1 and 1.1.1 on Lucene 4.7.2. Can I do the following: 1. stop node 1. 2. install 1.1.1 on node 1. 3. copy data folder to 1.1.1. 4. start node 1 and wait for it to synchronize. 5. stop node 2. 6. install 1.1.1 on node 2. 7. copy data folder to 1.1.1. 8. start node 2 and wait for it to synchronize. I can live with downtime if not possible otherwise. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26670af5-f4d5-4d11-859b-bdbbc08367f1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Parent/Child combination Script possible?
Thanks Sven, Yes this would solve a lot of use cases.. Is there anyone that can respond whether we should create an issue for that? The link provided does not mention whether finally this should be opened as an issue Thanks Thomas On Friday, 11 April 2014 18:53:08 UTC+3, Thomas wrote: Hello, I have two document types which are utilizing a parent/child relation. I want to perform an aggregation where the script utilizes fields from both documents. Is that possible? More specifically: Parent Document { tag:{ _id: { path: tag_id }, properties: { tag_id: {index: not_analyzed,type: string}, name: {index: not_analyzed,type: string} tag_counter: {type: integer} } } } Child Document { click:{ _parent: { type: tag }, properties: { type: {index: not_analyzed,type: string}, clicks_counter: {type: integer} } } } curl -XGET http://localhost:9200/tags-index/tags,clicks/_search; -d' { aggregations: { one_day_filter: { filter: { range: { ts: { gte: 2014-03-15T00:00:00, lt: 2014-03-15T01:00:00 } } }, aggregations: { parent: { filter: { has_child: { type: clicks, query: { match_all: {} } } }, aggregations: { metrics: { terms: { script: *doc[\tags.tag_counter\].value - doc[\clicks.clicks_counter\].value* } } } } } } } }, size: 0 }' Thanks Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9a188e78-e220-4afd-b840-e85ea0cade4e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Parent/Child combination Script possible?
Hello, I have two document types which are utilizing a parent/child relation. I want to perform an aggregation where the script utilizes fields from both documents. Is that possible? More specifically: Parent Document { tag:{ _id: { path: tag_id }, properties: { tag_id: {index: not_analyzed,type: string}, name: {index: not_analyzed,type: string} tag_counter: {type: integer} } } } Child Document { click:{ _parent: { type: tag }, properties: { type: {index: not_analyzed,type: string}, clicks_counter: {type: integer} } } } curl -XGET http://localhost:9200/tags-index/tags,clicks/_search; -d' { aggregations: { one_day_filter: { filter: { range: { ts: { gte: 2014-03-15T00:00:00, lt: 2014-03-15T01:00:00 } } }, aggregations: { parent: { filter: { has_child: { type: clicks, query: { match_all: {} } } }, aggregations: { metrics: { terms: { script: *doc[\tags.tag_counter\].value - doc[\clicks.clicks_counter\].value* } } } } } } } }, size: 0 }' Thanks Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ffee913-9e82-4862-befe-e0f7ff9038e8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Terms aggregation scripts running slower than expected
Hi, I am currently exploring the option of using scripts with aggregations and I noticed that for some reason scripts for terms aggregations are executed much slower than for other aggregations, even if the script doesn't access any fields yet. This also happens for native Java scripts. I'm running Elasticsearch 1.1.0. For example, on my data set the simple script 1 takes around 400ms for the sum and histogram aggregations, but takes around 25s to run on a terms aggregation, even on repeated runs. What is going on here? Terms aggregations without a script are very fast, and histogram/sum aggregations with scripts that access the document are also very fast: I had to transform a script aggregation that should have been a terms aggregation into a histogram and convert the numeric values back into terms on the client so the aggregation would be executed in reasonable time. In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }}) Out[2]: {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246}, u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327, u'key': u'1'}]}}, u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327}, u'timed_out': False, u'took': 24986} In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }}) Out[10]: {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246}, u'aggregations': {u'test_script': {u'value': 4231327.0}}, u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327}, u'timed_out': False, u'took': 363} In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 'aggregations': { 'test_script': { 'histogram': { 'script': '1', 'interval': 1 } } }}) Out[8]: {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246}, u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327, u'key': 1}]}}, u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327}, u'timed_out': False, u'took': 421} Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: query field computed by child query?
Thanks for the examples. Looks quite interesting. If I understand that correctly, I'd have to write a plugin doing my subquery. Too bad I don't have much time right now :( Sounds like an interesting challenge :) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f76b7750-fcba-48dd-a1be-61d385b12bd4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
query field computed by child query?
I have documents in a parent/child relation. In a query run on the parent, I'd like to know, if the found parents have children matching some query. I don't want to filter only parents with some conditions on the child, but only get the information, that they have childrens matching some query. Any idea if that's possible? I've been thinking maybe adding a scrip_field that would compute that, but have no idea how to run child queries from a script field. An example to clarify my problem: child has a boolean field error I run a query on the parent and want to show an information if any of the children has the error flag set. Any hint would be welcome. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c5bbf2eb-7960-4740-9e0c-a70dbe98a9aa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: query field computed by child query?
I want to return all parents (or those matching some other query conditions) but in addition to the other data in the document, I want to compute for each parent, if he has any child with a set error flag. I don't want to filter on this condition in this case. Am Freitag, 28. März 2014 14:21:30 UTC+1 schrieb Binh Ly: Not sure I understand. So if you run a _search on the parent, and use the has_child filter to return only parents that match some child condition, is that not what you want? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c99eb564-0767-4adc-b2f6-e4ca00c879a3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Inconsistent search cluster status and search results after long GC run
) at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:556) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.elasticsearch.transport.NodeNotConnectedException: [node2][inet[/10.216.32.81:9300]] Node not connected at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859) at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) ... 7 more NODE 2 [2014-03-27 07:19:02,871][INFO ][cluster.service ] [node2] removed {[node3][RRqWlTWnQ7ygvsOaJS0_mA][node3][inet[/10.235.38.84:9300]]{master=true},}, reason: zen-disco-node_failed([node3][RRqWlTWnQ7ygvsOaJS0_mA][node3][inet[/10.235.38.84:9 300]]{master=true}), reason failed to ping, tried [2] times, each with maximum [9s] timeout NODE 3 [2014-03-27 07:19:20,055][WARN ][monitor.jvm ] [node3] [gc][old][539697][754] duration [35.1s], collections [1]/[35.8s], total [35.1s]/[2.7m], memory [4.9gb]-[4.2gb]/[7.9gb], all_pools {[young] [237.8mb]-[7.4mb]/[266.2mb]}{[survivor] [25.5mb]-[0b]/[33 .2mb]}{[old] [4.6gb]-[4.2gb]/[7.6gb]} [2014-03-27 07:19:20,112][INFO ][discovery.zen] [node3] master_left [[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}], reason [do not exists on master, act as master failure] [2014-03-27 07:19:20,117][INFO ][cluster.service ] [node3] master {new [node1][DxlcpaqOTmmpNSRoqt1sZg][node1.example][inet[/10.252.78.88:9300]]{data=false, master=true}, previous [node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300 ]]{master=true}}, removed {[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true},}, reason: zen-disco-master_failed ([node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}) After this scenario, the cluster doesn't recover properly: The worst thing is that node 1 sees nodes 1+3, node 2 sees nodes 1+2 and node 3 sees nodes 1+3. Since the cluster is set up to operate with two nodes, both data nodes 2 and 3 accept data and searches, causing inconsistent results and requiring us to do a full cluster restart and reindex all production data to make sure the cluster is consistent again. NODE 1 (GET /_nodes): { cluster_name : elasticsearch, nodes : { DxlcpaqOTmmpNSRoqt1sZg : { name : node1, ... }, RRqWlTWnQ7ygvsOaJS0_mA : { name : node3, ... } } } NODE 2 (GET /_nodes): { cluster_name : elasticsearch, nodes : { A45sMYqtQsGrwY5exK0sEg : { name : node2, ... }, DxlcpaqOTmmpNSRoqt1sZg : { name : node1, ... } } } NODE 3 (GET /_nodes): { cluster_name : elasticsearch, nodes : { DxlcpaqOTmmpNSRoqt1sZg : { name : node1, ... }, RRqWlTWnQ7ygvsOaJS0_mA : { name : node3, ... } } } Here are the configurations: BASE CONFIG (for all nodes): action: disable_delete_all_indices: true discovery: zen: fd: ping_retries: 2 ping_timeout: 9s minimum_master_nodes: 2 ping: multicast: enabled: false unicast: hosts: [node1.example, node2.example, node3.example] index: fielddata: cache: node indices: fielddata: cache: size: 40% memory: index_buffer_size: 20% threadpool: bulk: queue_size: 100 type: fixed transport: tcp: connect_timeout: 3s NODE 1: node: data: false master: true name: node1 NODE 2: node: data: true master: true name: node2 NODE 3: node: data: true master: true name: node3 Questions: 1) What can we do to minimize long GC runs, so the nodes don't become unresponsive and disconnect in the first place? (FYI: Our index is currently about 80 GB in size with over 2M docs (per node), 60 shards, heap size 8 GB. We run both searches and aggregations on it.) 2) Obviously, having the cluster state in a state like the above is unacceptable and we therefore want to make sure that even if a node disconnects because of GC, the cluster can fully recover and only one of the two data nodes can accept data and searches while a node is disconnected. Is there anything that needs to be changed in the Elasticsearch code to fix this issue? Thanks, Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop
Split brain problem on Azure
I'm experiencing split brain problem on my Elasticsearch cluster on Azure, consisting of two nodes. I've read about the zen.ping.timeout and discovery.zen.minimum_master_nodes settings, but I guess that I can't use those settings, when using the Azure plugin. Any ideas for avoiding split brain using the Azure plugin? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/169394c1-6a8c-430e-a9c1-0286b1789fce%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Split brain problem on Azure
Ok. Also using the zen.* keys? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e5eaf00-b943-4c64-aaad-a5ac78400ea3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Inconsistent search cluster status and search results after long GC run
Thanks Jörg, I can increase the ping_timeout to 60s for now. However, shouldn't the goal be to minimize the time GC runs? Is the node blocked when GC runs and will delay any requests to it? If so, then it would be very bad to allow long GC runs. Regarding the bulk thread pool: I specifically set this to a higher value to avoid errors when we perform bulk indexing (we had errors sometimes when the queue was full and set to 50. I was also going to increase the index queue since there are sometimes errors). I will try keeping the limit and give it more heap space to indexing instead, as you suggested. Regarding Java 8: We're currently running Java 7 and haven't tweaked any GC specific settings. Do you think it makes sense to already switch to Java 8 on production and enable the G1 garbage collector? Thanks again, Thomas On Thursday, March 27, 2014 9:41:10 PM UTC+1, Jörg Prante wrote: It seems you run into trouble because you changed some of the default settings, worsening your situation. Increase ping_timout from 9s to 60s as first band aid - you have GCs with 35secs running. You should reduce the bulk thread pool of 100 to 50, this reduces high memory pressure on the 20% memory you allow. Give more heap space to indexing, use 50% instead of 20%. Better help would be to diagnose the nodes if you exceed the capacity for search and index operations. If so, think about adding nodes. More finetuning after adding nodes could include G1 GC with Java 8, which is targeted to minimize GC stalls. This would not solve node capacity problems though. Jörg On Thu, Mar 27, 2014 at 4:46 PM, Binh Ly binh...@yahoo.com javascript:wrote: I would probably not master enable any node that can potentially gc for a couple seconds. You want your master-eligible nodes to make decisions as quick as possible. About your GC situation, I'd find out what the underlying cause is: 1) Do you have bootstrap.mlockall set to true? 2) Does it usually triggered while running queries? Or is there a pattern on when it usually triggers? 3) Is there anything else running on these nodes that would overload and affect normal ES operations? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd594a91-00c4-43ae-97d8-bbda35618d8e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/cd594a91-00c4-43ae-97d8-bbda35618d8e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86db1b12-038f-47d6-9fac-9e8eb8314dbc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Inconsistent search cluster status and search results after long GC run
Forgot to reply to your questions, Binh: 1) No I haven't set this. However I wonder if this has any significant effect since swap space is barely used. 2) It seems to happen when the cluster is under high load but I haven't seen any specific pattern so far. 3) No there's not. There's a very small Redis instance running on node1, but there's nothing else on the nodes with shards (where the GC problem happens). If I was going to disable master on any node that has shards I'd have to add another dummy node with master:true so the cluster is in good state if any one of the nodes is down. On Thursday, March 27, 2014 4:46:41 PM UTC+1, Binh Ly wrote: I would probably not master enable any node that can potentially gc for a couple seconds. You want your master-eligible nodes to make decisions as quick as possible. About your GC situation, I'd find out what the underlying cause is: 1) Do you have bootstrap.mlockall set to true? 2) Does it usually triggered while running queries? Or is there a pattern on when it usually triggers? 3) Is there anything else running on these nodes that would overload and affect normal ES operations? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0ae93e7c-a6f7-4784-8b4a-71d6f52552a7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Delete by query fails often with HTTP 503
Hi, We often get failures when using the delete by query API. The response is an HTTP 503 with a body like this: {_indices: {myindex: {_shards: {successful: 2, failed: 58, total: 60 Is there a way to figure out what is causing this error? It seems to mostly happen when the search cluster is busy. Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Delete by query fails often with HTTP 503
Thanks Clint, We have two nodes with 60 shards per node. I will increase the queue size. Hopefully this will reduce the amount of rejections. Thomas On Tuesday, March 18, 2014 6:11:27 PM UTC+1, Clinton Gormley wrote: Do you have lots of shards on just a few nodes? Delete by query is handled by the `index` thread pool, but those threads are shared across all shards on a node. Delete by query can produce a large number of changes, which can fill up the thread pool queue and result in rejections. You can either just (a) retry or (b) increase the queue size for the `index` thread pool (which will use more memory as more delete requests will need to be queued) See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#types clint On 18 March 2014 08:13, Thomas S. thom...@gmail.com javascript: wrote: Hi, We often get failures when using the delete by query API. The response is an HTTP 503 with a body like this: {_indices: {myindex: {_shards: {successful: 2, failed: 58, total: 60 Is there a way to figure out what is causing this error? It seems to mostly happen when the search cluster is busy. Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b815184a-8382-4b25-8a54-b98753f6cbb4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Unable to load script under config/scripts
Hi, I'm trying to keep some scripts within config/scripts but elasticsearch seems that it cannot locate them. What could be a possible reason for this? When need to invoke it es fails with the following No such property: scriptname for class: Script1 Any ideas? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f96b90e4-7704-49e0-8dd6-38ef1ebe6558%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Queue capacity and EsRejectedExecutionException leads to loss of data
Thanks David, So this is a rabbitMQRiver issue, is there a need to open a separate issue? (Never done the procedure, will look this one) Thomas On Wednesday, 26 February 2014 15:48:55 UTC+2, Thomas wrote: Hi, We have installed the RabbitMQ river plugin to pull data from our Queue and adding them to ES. The thing is that at some point we are receiving the following exception and we have as a result to *lose data*. [1775]: index [events-idx], type [click], id [3f6e4604146b435aabcf4ea5a493fd32], message [EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@12843ca2]] We have changed the configuration of queue size to 1000 and the problem disappeared. My question is that is there any configuration/way to tell ES to instead of throwing this exception and discarding the document to wait for available resources (with the corresponding performance impact)? Thanks Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03dcb0ea-2b6a-478b-b678-f52ecbc09298%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Avoiding duplicate documents with versioning
Just for any other people that might find this post useful, finally we managed to get the expected functionality as described here Thanks Thomas On Saturday, 15 February 2014 16:53:20 UTC+2, Thomas wrote: Hi, First of all congrats for the 1.0 release!! Thumbs up for the aggregation framework :) I'm trying to build a system which is kind of querying for analytics. I have a document called *event*, and I have events of specific type (e.g. click open etc.) per page. So per page i might have for example an *open event*. The thing is that I might as well take the open event *more than once*, but I want to count it only once. So I use the versioning API and I provide the same document id having as a result the version to increase. In my queries I use the _timestamp field to determine the last document that I counted. But my problem is that since ES reindex the document, it updates _timestamp so it seems as recent document, and in my queries I count it again. Is there a way to simply *discard* the document if the document with the same id exists, without stopping the bulk operation of uploading documents? Thanks Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49af9451-023c-4c49-9211-255b07ca2191%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Avoiding duplicate documents with versioning
Hi, First of all congrats for the 1.0 release!! Thumbs up for the aggregation framework :) I'm trying to build a system which is kind of querying for analytics. I have a document called *event*, and I have events of specific type (e.g. click open etc.) per page. So per page i might have for example an *open event*. The thing is that I might as well take the open event *more than once*, but I want to count it only once. So I use the versioning API and I provide the same document id having as a result the version to increase. In my queries I use the _timestamp field to determine the last document that I counted. But my problem is that since ES reindex the document, it updates _timestamp so it seems as recent document, and in my queries I count it again. Is there a way to simply *discard* the document if the document with the same id exists, without stopping the bulk operation of uploading documents? Thanks Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15a8062b-a60c-4c2e-ae41-6dd31b4b360b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Avoiding duplicate documents with versioning
Just an update, If we use the op_type=create in the index request, will probably discard the duplicate document. But, in the case where we do a bulk operation will it stop the bulk upload? or will generate the error and move on to the next document? thanks On Saturday, 15 February 2014 16:53:20 UTC+2, Thomas wrote: Hi, First of all congrats for the 1.0 release!! Thumbs up for the aggregation framework :) I'm trying to build a system which is kind of querying for analytics. I have a document called *event*, and I have events of specific type (e.g. click open etc.) per page. So per page i might have for example an *open event*. The thing is that I might as well take the open event *more than once*, but I want to count it only once. So I use the versioning API and I provide the same document id having as a result the version to increase. In my queries I use the _timestamp field to determine the last document that I counted. But my problem is that since ES reindex the document, it updates _timestamp so it seems as recent document, and in my queries I count it again. Is there a way to simply *discard* the document if the document with the same id exists, without stopping the bulk operation of uploading documents? Thanks Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dbf19235-5b76-4a09-8b86-9a0fbf7e8d1c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Marvel houskeeping
I upgraded elasticsearch to 0.90.11 and installed marvel. Congratulations on a really nice tool! Now I have a small issue: since marvel is generating quite a lot of data (for our develop system), I would like to configure an automatic delete of old data. Is there such an option? I didn't find anything in the documentation. It would be great to specify a rolling window of n days of data to keep. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/90ac3f1f-23c4-461f-95d5-f054f1fc5706%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
ElasticSearch Analytics Capabilites
ES seems to have ability to run analytic queries. I have read about people using it as an OLAP solution [1], although I have not yet read anyone describe their experience. In that respect how does ES analytics capabilities compare against: 1) Dremel clones [2] like Impala Presto (for near real-time, ad hoc analytic queries over large datasets) 2) Lambda Architecture [3] systems (where queries are known up- front, but need to run against a large dataset) Does anyone here have experience running ES in such usecases, beyond the free text searching one ES is well-known for? Thanks, Binil [1]: https://groups.google.com/forum/#!topic/elasticsearch/iTy9IYL23as [2]: http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf [3]: http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c75a380-3971-45cd-b10d-a91b3b97ecc3%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: EC2 Discovery is not working with AutoScaling group (AWS)
Finally, I fixed my problem. There was a mistake for the field discovery.ec2.groups. Instead of a string, I had to put an array of string. And I also forgot to add the tag platform:prod to CloudFormation when launching my stack. Fixed! On Friday, 7 February 2014 14:54:05 UTC+1, Thomas FATTAL wrote: Hi, I'm trying to configure two Elasticsearch nodes in AWS in the same autoscaling group (CloudFormation). I am having some problems with them discovering each other. The following shows the elasticsearch.log I have on the first machine with the instance-id i-2db5db03. The second machine has an instance-id i-324e6612. It seems that both nodes recognize each other, thanks to discovery.ec2.tag.* field I added but then there are some problems that make them not to join together: [2014-02-07 13:17:08,852][INFO ][node ] [ip-10-238-225-133.ec2.internal] version[1.0.0.Beta2], pid[15342], build[296cfbe/2013-12-02T15:46:27Z] [2014-02-07 13:17:08,853][INFO ][node ] [ip-10-238-225-133.ec2.internal] initializing ... [2014-02-07 13:17:08,917][INFO ][plugins ] [ip-10-238-225-133.ec2.internal] loaded [cloud-aws], sites [paramedic] [2014-02-07 13:17:15,452][DEBUG][discovery.zen.ping.unicast] [ip-10-238-225-133.ec2.internal] using initial hosts [], with concurrent_connects [10] [2014-02-07 13:17:15,455][DEBUG][discovery.ec2] [ip-10-238-225-133.ec2.internal] using ping.timeout [3s], master_election.filter_client [true], master_election.filter_data [false] [2014-02-07 13:17:15,456][DEBUG][discovery.zen.elect ] [ip-10-238-225-133.ec2.internal] using minimum_master_nodes [1] [2014-02-07 13:17:15,457][DEBUG][discovery.zen.fd ] [ip-10-238-225-133.ec2.internal] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3] [2014-02-07 13:17:15,500][DEBUG][discovery.zen.fd ] [ip-10-238-225-133.ec2.internal] [node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3] [2014-02-07 13:17:16,769][DEBUG][discovery.ec2] [ip-10-238-225-133.ec2.internal] using host_type [PRIVATE_IP], tags [{platform=prod}], groups [[]] with any_group [true], availability_zones [[]] [2014-02-07 13:17:19,930][INFO ][node ] [ip-10-238-225-133.ec2.internal] initialized [2014-02-07 13:17:19,931][INFO ][node ] [ip-10-238-225-133.ec2.internal] starting ... [2014-02-07 13:17:20,455][INFO ][transport] [ip-10-238-225-133.ec2.internal] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.238.225.133:9300]} [2014-02-07 13:17:20,527][TRACE][discovery] [ip-10-238-225-133.ec2.internal] waiting for 30s for the initial state to be set by the discovery [2014-02-07 13:17:21,981][TRACE][discovery.ec2] [ip-10-238-225-133.ec2.internal] building dynamic unicast discovery nodes... [2014-02-07 13:17:21,982][TRACE][discovery.ec2] [ip-10-238-225-133.ec2.internal] filtering out instance i-2db5db03 based tags {platform=prod}, not part of [{Key: aws:cloudformation:stack-id, Value: arn:aws:cloudformation:us-east-1:876119091332:stack/ES-10/daf53050-8ff8-11e3-bdce-50e241629418, }, {Key: aws:cloudformation:stack-name, Value: ES-10, }, {Key: aws:cloudformation:logical-id, Value: ESASG, }, {Key: aws:autoscaling:groupName, Value: ES-10-ESASG-BHGX7KKQ9QPR, }] [2014-02-07 13:17:21,983][TRACE][discovery.ec2] [ip-10-238-225-133.ec2.internal] filtering out instance i-324e6612 based tags {platform=prod}, not part of [{Key: aws:cloudformation:logical-id, Value: ESASG, }, {Key: aws:cloudformation:stack-id, Value: arn:aws:cloudformation:us-east-1:876119091332:stack/ES-10/daf53050-8ff8-11e3-bdce-50e241629418, }, {Key: aws:cloudformation:stack-name, Value: ES-10, }, {Key: aws:autoscaling:groupName, Value: ES-10-ESASG-BHGX7KKQ9QPR, }] [2014-02-07 13:17:21,983][DEBUG][discovery.ec2] [ip-10-238-225-133.ec2.internal] using dynamic discovery nodes [] [2014-02-07 13:17:23,744][TRACE][discovery.ec2] [ip-10-238-225-133.ec2.internal] building dynamic unicast discovery nodes... [2014-02-07 13:17:23,745][TRACE][discovery.ec2] [ip-10-238-225-133.ec2.internal] filtering out instance i-2db5db03 based tags {platform=prod}, not part of [{Key: aws:cloudformation:stack-id, Value: arn:aws:cloudformation:us-east-1:876119091332:stack/ES-10/daf53050-8ff8-11e3-bdce-50e241629418, }, {Key: aws:cloudformation:stack-name, Value: ES-10, }, {Key: aws:cloudformation:logical-id, Value: ESASG, }, {Key: aws:autoscaling:groupName, Value: ES-10-ESASG-BHGX7KKQ9QPR, }] [2014-02-07 13:17:23,745][TRACE][discovery.ec2] [ip-10-238-225-133.ec2.internal] filtering out instance i-324e6612 based tags {platform=prod}, not part of [{Key: aws:cloudformation:logical-id, Value: ESASG, }, {Key: aws:cloudformation:stack-id, Value
Deployment of a ES cluster on AWS
Hi! I want to deploy a cluster of Elasticsearch nodes on AWS. All our existing infrastructure is using CloudFormation with Chef cookbooks. We also did setup AutoScaling Group to restart application nodes automatically when some are going down. I have several questions concerning the ES cluster I try to setup: 1) I was wondering what are the best practices for managing a ES cluster on AWS. Is it recommended to put the EC2 ES nodes in an auto-scaling group as well? Or is it a problem for the EC2 discovery? 2) If the CPU goes at 100% on a machine, is it recommended to upgrade the type of the machine to something more powerful or to add a new node? 3) Is there a recommended configuration schema in term of number of nodes in the cluster ? Thanks a lot for your answer, Thomas (@nypias) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/441c3b42-0fb6-427d-b520-85f8c8ba1fee%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: There were no results because no indices were found that match your selected time span
Okay, thanks! On Tuesday, January 28, 2014 8:53:27 PM UTC+1, David Pilato wrote: Should work from 0.90.9. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 28 janvier 2014 at 20:51:14, Thomas Ardal (thoma...@gmail.comjavascript:) a écrit: I know and that's the plan. But with 1.0.0 right around the corner and a lot of data to migrate, I'll probably wait for that one. Does Marvel only support the most recent versions of ES? On Tuesday, January 28, 2014 8:43:26 PM UTC+1, David Pilato wrote: 0.90.1? You should update to 0.90.10. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 28 janv. 2014 à 20:11, Thomas Ardal thoma...@gmail.com a écrit : As bonus info I'm running Elasticsearch 0.90.1 on windows server 2012. I'm using the Jetty plugin to force https and basic authentication, but are accessing Marvel from localhost through http. My browser asks me for credentials when opening the Marvel url, so it could be caused by the basic authentication setup. Or? On Tuesday, January 28, 2014 8:01:21 PM UTC+1, Thomas Ardal wrote: When trying out Marvel on my Elasticsearch installation, I get the error There were no results because no indices were found that match your selected time span in the top of the page. If I understand the documentation, Marvel automatically collects statistics from all indexes on the node. What am I doing wrong? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7255ee52-5101-4942-8abd-b29642035237%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8bee9ba2-d0bf-42c3-b8ac-2c45707b9f96%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2e396aa-7fcb-4257-ba10-c5b89827f662%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
There were no results because no indices were found that match your selected time span
When trying out Marvel on my Elasticsearch installation, I get the error There were no results because no indices were found that match your selected time span in the top of the page. If I understand the documentation, Marvel automatically collects statistics from all indexes on the node. What am I doing wrong? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/454e6e01-de1a-4a23-b270-16bf90273c47%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: There were no results because no indices were found that match your selected time span
As bonus info I'm running Elasticsearch 0.90.1 on windows server 2012. I'm using the Jetty plugin to force https and basic authentication, but are accessing Marvel from localhost through http. My browser asks me for credentials when opening the Marvel url, so it could be caused by the basic authentication setup. Or? On Tuesday, January 28, 2014 8:01:21 PM UTC+1, Thomas Ardal wrote: When trying out Marvel on my Elasticsearch installation, I get the error There were no results because no indices were found that match your selected time span in the top of the page. If I understand the documentation, Marvel automatically collects statistics from all indexes on the node. What am I doing wrong? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7255ee52-5101-4942-8abd-b29642035237%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: There were no results because no indices were found that match your selected time span
I know and that's the plan. But with 1.0.0 right around the corner and a lot of data to migrate, I'll probably wait for that one. Does Marvel only support the most recent versions of ES? On Tuesday, January 28, 2014 8:43:26 PM UTC+1, David Pilato wrote: 0.90.1? You should update to 0.90.10. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 28 janv. 2014 à 20:11, Thomas Ardal thoma...@gmail.com javascript: a écrit : As bonus info I'm running Elasticsearch 0.90.1 on windows server 2012. I'm using the Jetty plugin to force https and basic authentication, but are accessing Marvel from localhost through http. My browser asks me for credentials when opening the Marvel url, so it could be caused by the basic authentication setup. Or? On Tuesday, January 28, 2014 8:01:21 PM UTC+1, Thomas Ardal wrote: When trying out Marvel on my Elasticsearch installation, I get the error There were no results because no indices were found that match your selected time span in the top of the page. If I understand the documentation, Marvel automatically collects statistics from all indexes on the node. What am I doing wrong? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7255ee52-5101-4942-8abd-b29642035237%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8bee9ba2-d0bf-42c3-b8ac-2c45707b9f96%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: parent/child where parent exists query
Hi Adrien and thanks for the reply, This sounds like what I was looking for :) Will investigate it Thanks Thomas On Thursday, 23 January 2014 20:25:01 UTC+2, Thomas wrote: Hi, I have been working with a parent child schema creation and I was wondering if there is a way to perform a search in children documents with the query where parent exists and get only those documents. I can index children documents and there is no mandatory to have parent document. Therefore, I want only to get the children who have a parent document. Is this functionality possible? How is possible to perform such query? I work with latest version 1.0.0.RC1 Thank you Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3bda84b6-a01c-4123-a875-05d7b23e9389%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: parent/child where parent exists query
Hi, I have two questions with regards this feature 1) I have been reading the documentation and at some point it described that all _ids are loaded into memory: all _id values are loaded to memory (heap) in order to support fast lookups This means that only the parent _ids that match correct? Not all the _ids of the parent/child relationship 2) Is it possible to perform an aggregation and take some values from the parent document and some from the child document? for example take some information from the parent document (e.g. city) and some information from the child document (e.g. for car model= BMW) and do some counts? and present them all together? e.g. { parentDoc:{ properties: { city:{ type: string, index: not_analyzed } } } } { childDoc:{ properties: { carModel:{ type: string, index: not_analyzed } } } } Aggregate child documents per child Document where parent exists (because i have childDocuments without parent info and i do not want to take those) per city and carModel. Below is an idea of what I'm trying to do curl -XGET 'localhost:9200/delivery_logs_pc_idx/childDoc/_search?pretty' -d '{ query: { match_all:{} }, aggregations : { myFilter : { filter : { { has_parent : { type : deliveryLog, query: {match_all:{}} } } }, aggregations : { preferrence : { terms : { script : *doc.parent.city.value* + *doc.child.carModel.value*, size:100 } } } } }, size:0 }' Is there an alternative way of achieving that? Thank you On Thursday, 23 January 2014 20:25:01 UTC+2, Thomas wrote: Hi, I have been working with a parent child schema creation and I was wondering if there is a way to perform a search in children documents with the query where parent exists and get only those documents. I can index children documents and there is no mandatory to have parent document. Therefore, I want only to get the children who have a parent document. Is this functionality possible? How is possible to perform such query? I work with latest version 1.0.0.RC1 Thank you Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05e42115-b73f-40fe-81ef-332ca0ab75b1%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Cannot set string type to analyzed
Thanks for your reply Here is an example: { query:{ filtered:{ query:{term:{userId:testUser1}}, filter:{ nested:{ path:extra, query:{ filtered:{ query:{match_all:{}}, filter:{ bool:{ must:[ {term:{key : city}}, {term:{value : Stockholm}} ] } } } } } } } }, size:0 } I have inserted data with key city and value Stockholm in my scenario, the thing is that if I change the mapping to not_analyzed and run the same scenario i get back data whereas if I set index to analyzed or not set it at all (default value) then I do not get back results which leads me to the conclusion that somehow analyzed is not accepted when setting mapping Thomas On Tuesday, December 17, 2013 5:10:58 PM UTC+2, Thomas wrote: Hi, I'm trying to create a mapping for a nested document and I realize that i cannot set the index type of a string field to analyzed: { action: { _all: { enabled: false }, _type: { index: no }, _timestamp: { enabled: true, store: yes }, _routing: { required: true, path: userId }, properties: { userId: { type: string, index: not_analyzed }, extra: { type: nested, properties: { key: { type: string, index: not_analyzed }, *value: {* * type: string,* * index: analyzed* * }* } } } } } The above ends up with the following result when calling for the mapping (curl -XGET localhost:9200/my_index/action/_mapping?pretty ): { action: { _all: { enabled: false }, _type: { index: no }, _timestamp: { enabled: true, store: yes }, _routing: { required: true, path: userId }, properties: { userId: { type : string, index : not_analyzed, omit_norms : true, index_options : docs }, extra: { type: nested, properties: { key: { type : string, index : not_analyzed, omit_norms : true, index_options : docs }, *value: {* * type: string* * }* } } } } } Should it display index type as analyzed? furthermore I cannot search based on this field? Should I search by defining the analyzer? ES version 0.90.7 but i noticed this since version 0.90.2, probably I'm doing something wrong here. Looking forward for your reply Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b31fd783-2477-4011-8f59-6c7651b0437e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.