Bug in Blog
There was a useful blog post about D3 and aggregations http://webcache.googleusercontent.com/search?q=cache:QhzvfcKiM70J:www.elasticsearch.org/blog/data-visualization-elasticsearch-aggregations/ Now I get this page which is mostly broken https://www.elastic.co/blog/data-visualization-elasticsearch-aggregations/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a364878-f542-4150-98d4-414fd6d7b58e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Unexpected behavior from nested - filter - nested aggregation
I'm also running into this it is not what I expected. I tried parent/child and got the same behavior. I expect the filtering to narrow down the results with each filter. I filter on a child (or nested) that has property=p then go back to aggregate on the parent and I get all the results again as if the filter is not applied. I can include sample data, mapping, and queries if someone wants to comment. I'm trying to do clickstream analysis on session events and user actions. The data models I am trying are where the session is the parent event with user actions as the child (or nested) events. I've tried several different models and have not found one that works well. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bbc238d6-9f60-4ede-b311-9d522f930120%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Can i elastic search as my primary store?
I have heard from the source, Do not use Elasticsearch as a data store. But some people do and it works ok. I would recommend that you use the snapshot and restore features. And back up your json file data so you can re-index in case your index gets corrupted. And be careful upgrading, especially between breaking versions. On Friday, October 24, 2014 2:32:56 PM UTC-7, Akram Hussein wrote: Is it a use case today to use elastic search as a primary store? basically using it similar to mongodb? is that a use case the product is moving towards or it is mostly just for search? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6124515-fea1-47ab-9b0a-6718e4123164%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: nested aggregation against key value pairs
Have you tried the usual sub-aggregations? It looks like it should do exactly what you want. If so, why does that not work? Can you include some sample data and queries you have tried so that we can index it and try your queries? Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for the buckets which their parent aggregation generates. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html On Friday, October 24, 2014 12:17:04 PM UTC-7, Jay Hilden wrote: I have an ES type with a nested KeyValuePair type. What I'm trying to do is a terms aggregation on both the key and value fields such that I'd get the following results: Key1 - Value1: DocCount = 10 Key1 - Value2: DocCount = 9 Key2 - Value3: DocCount = 4 Here is my mapping: { index123 : { mappings : { type123 : { properties : { authEventID : { type : long }, authInput : { properties : { uIDExtensionFields : { type : nested, properties : { key : { type : string }, value : { type : string } } } } } } } } } } Is there a way to do this? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5415a7f5-31ea-4085-af3a-0bbbdc875ea9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Unexpected behavior from nested - filter - nested aggregation
Lastly, it is not possible to “cross reference” between nested documents. One nested doc cannot “see” another nested doc’s properties. For example, you are not able to filter on “A.name” but facet on “B.age”. You can get around this by using `include_in_root`, which effectively copies the nested docs into the root, but this get’s you back to the problems of inner objects. http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ Perhaps this is our answer? On Friday, October 24, 2014 3:31:11 PM UTC-7, Zennet Wheatcroft wrote: I'm also running into this it is not what I expected. I tried parent/child and got the same behavior. I expect the filtering to narrow down the results with each filter. I filter on a child (or nested) that has property=p then go back to aggregate on the parent and I get all the results again as if the filter is not applied. I can include sample data, mapping, and queries if someone wants to comment. I'm trying to do clickstream analysis on session events and user actions. The data models I am trying are where the session is the parent event with user actions as the child (or nested) events. I've tried several different models and have not found one that works well. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24b79787-aa22-4c15-8df1-831149d3f621%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch Having-Clause?
As the open issue you quote suggests, ES currently has no support for an equivalent to to SQL’s HAVING clause. Here's another reference which supports that: https://groups.google.com/forum/#!msg/elasticsearch/UsrCG2Abj-A/IDO9DX_PoQwJ What I did as a workaround is get all the results in an intermediate layer and then loop through them to leaving out the ones not meeting my boolean criteria (COUNT(*) = x). But that is not really a solution to your problem of too many results. I had 200,000 results which worked fine but if I had 200M that would not work so well. And it won't work for any of the aggregation functions (sum, min, max, avg) other than count as far as I can tell. Have you considered the 'min_doc_count: 50' feature? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count I used it to filter out all the groups that have less than x documents and then manually removed the groups with more than x. In your case it looks like you want to filter on something like HAVING SUM(impression) 49 and I don't think there is even a workaround for that because the functions, and even script filters, are applied to documents, not to the aggregations. At least as far as I can tell. It would be great if someone showed me otherwise. Zennet On Wednesday, February 5, 2014 7:50:01 AM UTC-8, tke...@rundsp.com wrote: Hi guys, I’m using elasticsearch 1.0.0RC2 and wondering if there is an equivalent to SQL’s “having-clause” for the aggregation framework there. Below is an example query and a link to a ticket that describes the issue well. The part of the query that's highlighted doesn't work, and is there purely to give an idea of what I'm after. This query (omitting the highlighted portion) gives impression counts for every placement-referer-device-date combo. This is fine but the output is HUGE! I was wondering if there was a way (like a having clause or filter) to reduce the amount of results based off some logic (in this case, impressions counts greater than 50). Thanks all! - Trev https://github.com/elasticsearch/elasticsearch/issues/4404 curl -XPOST //_search?pretty=true -d ' { size:0, query: { filtered: { query: { range: { date_time: { from: ZZZ, to: , include_lower: true, include_upper: true } } } } }, aggs: { placement: { terms: { field: placement }, aggs: { device: { terms: { field: device }, aggs: { referer: { terms: { field: referer }, aggs: { totals: { date_histogram: { field: date_time, interval: day }, aggs: { impression: { sum: { field: impression } ,having : { from : 50 } } } } } } } } } } } } ' -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/99cd80a9-8f5d-4378-b4a5-09b8421f8c4e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Securing Data in Elasticsearch
If we want to use Kibana we will run into the same issue. I heard Shay say that Kibana really was not developed for the use case of exposing to external customers but he did not elaborate on that. What I was thinking of doing is wrapping ES in a simple web app that forwards GET requests from Kibana on to ES (keeping the same API) but blocks DELETE, PUT, and POST requests returning a 501 Not Implemented. Do you think that would work for maintaining functionality and disallowing updates and deletes? Would that work for your requirements? Zennet On Thursday, June 12, 2014 7:48:47 AM UTC-7, Harvii Dent wrote: Hello, I'm planning to use Elasticsearch with Logstash for logs management and search, however, one thing I'm unable to find an answer for is making sure that the data cannot be modified once it reaches Elasticsearch. action.destructive_requires_name prevents deleting all indices at once, but they can still be deleted. Are there any options to prevent deleting indices altogether? And on the document level, is it possible to disable 'delete' *AND* 'update' operations without setting the entire index as read-only (ie. 'index.blocks.read_only')? Lastly, does setting 'index.blocks.read_only' ensure that the index files on disk are not changed (so they can be monitored using a file integrity monitoring solution)? as many regulatory and compliance bodies have requirements for ensuring logs integrity. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e4ce0c9a-30a4-4077-b3eb-e4bb5ab2dc0b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Inter-document Queries
I simplified the actual problem in order to avoid explaining the domain specific details. Allow me to add back more detail. We want to be able to search for multiple points of user action, towards a conversion funnel, and condition on multiple fields. Let's add another field (response) to the above model: {.., path:/promo/A, response: 200, ..} {.., path:/page/1, response: 401, ..} {.., path:/promo/D,response: 200, ..} {.., path:/page/23, response: 301, ..} {.., path:/page/2, response: 418, ..} Let's say we define three points through the conversion funnel: A: Visited path=/page/1 B: Got response=401 from some path C: Exited at path=/sale/C And we want to know how many users did steps A-B-C in that order. If we add an array prev_response like we did for prev_path, then we can use a term filter to find documents with term path=/sale/C and prev_path=/page/1 and prev_response=401. But this will not distinguish between A-B-C and B-A-C. Perhaps I could use the script filter for the last mile and from the term filtered results throw out B-A-C and it will run more quickly because of the reduced document set. Is there another way to implement this query? Zennet On Wednesday, June 4, 2014 5:01:19 PM UTC-7, Itamar Syn-Hershko wrote: You need to be able to form buckets that can be reduced again, either using the aggregations framework or a query. One model that will allow you to do that is something like this: { userid: xyz, path:/sale/B, previous_paths:[...], tstamp:..., ... } So whenever you add a new path, you denormalize and add previous paths that could be relevant. This might bloat your storage a bit and be slower on writes, but it is very optimized for reads since now you can do an aggregation that queries for the desired path and buckets on the user. To check the condition of the previous path you should be able to bucket again using a script, or maybe even with a query on a nested type. This is just from the top of my head but should definitely work if you can get to that model -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 5, 2014 at 2:36 AM, Zennet Wheatcroft zwhea...@atypon.com javascript: wrote: Yes. I can re-index the data or transform it in any way to make this query efficient. What would you suggest? On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote: This model is not efficient for this type of querying. You cannot do this in one query using this model, and the pre-processing work you do now + traversing all documents is very costly. Is it possible for you to index the data (even as a projection) into Elasticsearch using a different model, so you can use ES properly using queries or the aggregations framework? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft zwhea...@atypon.com wrote: Hi, I am looking for an efficient way to do inter-document queries in Elasticsearch. Specifically, I want to count the number of users that went through an exit point B after visiting point A. In general terms, say we have some event log data about users actions on a website: {userid:xyz, machineid:110530745, path:/promo/A, country: US, tstamp:2013-04-01 00:01:01} {userid:pdq, machineid:110519774, path:/page/1, country: CN, tstamp:2013-04-01 00:02:11} {userid:xyz, machineid:110530745, path:/promo/D, country: US, tstamp:2013-04-01 00:06:31} {userid:abc, machineid:110527022, path:/page/23, country: DE, tstamp:2013-04-01 00:08:00} {userid:pdq, machineid:110519774, path:/page/2, country: CN, tstamp:2013-04-01 00:08:55} {userid:xyz, machineid:110530745, path:/sale/B, country: US, tstamp:2013-04-01 00:09:46} {userid:abc, machineid:110527022 , path:/promo/A, country :DE, tstamp:2013-04-01 00:10:46} And we have 500+M such entries. We want a count of the number of userids that visited path=/sale/B after visiting path=/promo/A. What I did is to preprocess the data, sorting by userid, tstamp, then compacting all events by the same userid into the same document. Then I wrote a script filter which traverses the path array per document, and returns true if it finds any occurrence of B followed by A. This however is inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This script filter query takes over 300 seconds. Specifically, it can process events at about 400K events per second. BY comparison, I wrote a naive program that does a linear pass of the un-compacted data and that process 11M events per second. By which I conclude that Elasticsearch does not do well on this type of query. I am hoping someone can indicate a more
Re: Exposing elastic search query APIs at a public endpoint
Hi Pradeep, We are in the middle of doing the same thing, designing a system for reporting. And I want to create a middle API layer for the reasons you suggest and other reasons. I would like to exchange notes with you in a private message, if you want. You have to create some middle later, right? You don't want to let users issue request for DELETE http://yourhost:9200/_all/. Zennet On Tuesday, June 10, 2014 12:03:08 AM UTC-7, Pradeep Narayan wrote: Hi - We are designing a system for reporting and are planning to use Elastic search as a backend. We want to expose reporting in such a way that users can build custom reports on top of their data without us coming in their way. One way to do this is to expose elastic search query APIs through our public endpoints. The other option is to use an abstraction language which get translated to elastic search queries in the middle tier. The latter option allows us to control what runs on ES but can become restrictive in terms of how much we expose the rich query mechanism of ES using the abstraction layer. I would like to know if there is a known design pattern to solve this. How have users of elastic search addressed flexibility vs. control? Regards, Pradeep -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad6388f1-d1df-40b6-90c8-d9b87bea17cd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Inter-document Queries
Thank you Itamar and Jörg for your replies. I followed your suggestion Itamar and it works. Queries that took 300+ seconds are now 400 ms again. However, this model increases stored space complexity by O(N^2) which is usually not acceptable. So I would not consider this a general method. It works because the median length of a user session is about 3. We have sessions with 100s of events. If the median length of a session were 1000 then this method would no longer work. Any other ideas or refinements? Or is this the best we can do with Elasticsearch? Zennet On Wednesday, June 4, 2014 5:01:19 PM UTC-7, Itamar Syn-Hershko wrote: You need to be able to form buckets that can be reduced again, either using the aggregations framework or a query. One model that will allow you to do that is something like this: { userid: xyz, path:/sale/B, previous_paths:[...], tstamp:..., ... } So whenever you add a new path, you denormalize and add previous paths that could be relevant. This might bloat your storage a bit and be slower on writes, but it is very optimized for reads since now you can do an aggregation that queries for the desired path and buckets on the user. To check the condition of the previous path you should be able to bucket again using a script, or maybe even with a query on a nested type. This is just from the top of my head but should definitely work if you can get to that model -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 5, 2014 at 2:36 AM, Zennet Wheatcroft zwhea...@atypon.com javascript: wrote: Yes. I can re-index the data or transform it in any way to make this query efficient. What would you suggest? On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote: This model is not efficient for this type of querying. You cannot do this in one query using this model, and the pre-processing work you do now + traversing all documents is very costly. Is it possible for you to index the data (even as a projection) into Elasticsearch using a different model, so you can use ES properly using queries or the aggregations framework? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft zwhea...@atypon.com wrote: Hi, I am looking for an efficient way to do inter-document queries in Elasticsearch. Specifically, I want to count the number of users that went through an exit point B after visiting point A. In general terms, say we have some event log data about users actions on a website: {userid:xyz, machineid:110530745, path:/promo/A, country: US, tstamp:2013-04-01 00:01:01} {userid:pdq, machineid:110519774, path:/page/1, country: CN, tstamp:2013-04-01 00:02:11} {userid:xyz, machineid:110530745, path:/promo/D, country: US, tstamp:2013-04-01 00:06:31} {userid:abc, machineid:110527022, path:/page/23, country: DE, tstamp:2013-04-01 00:08:00} {userid:pdq, machineid:110519774, path:/page/2, country: CN, tstamp:2013-04-01 00:08:55} {userid:xyz, machineid:110530745, path:/sale/B, country: US, tstamp:2013-04-01 00:09:46} {userid:abc, machineid:110527022 , path:/promo/A, country :DE, tstamp:2013-04-01 00:10:46} And we have 500+M such entries. We want a count of the number of userids that visited path=/sale/B after visiting path=/promo/A. What I did is to preprocess the data, sorting by userid, tstamp, then compacting all events by the same userid into the same document. Then I wrote a script filter which traverses the path array per document, and returns true if it finds any occurrence of B followed by A. This however is inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This script filter query takes over 300 seconds. Specifically, it can process events at about 400K events per second. BY comparison, I wrote a naive program that does a linear pass of the un-compacted data and that process 11M events per second. By which I conclude that Elasticsearch does not do well on this type of query. I am hoping someone can indicate a more efficient way to do this query in ES. Or else confirm that ES cannot do inter-document queries well. Thanks, Zennet -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com?utm_medium
Re: Inter-document Queries
Yes. I can re-index the data or transform it in any way to make this query efficient. What would you suggest? On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote: This model is not efficient for this type of querying. You cannot do this in one query using this model, and the pre-processing work you do now + traversing all documents is very costly. Is it possible for you to index the data (even as a projection) into Elasticsearch using a different model, so you can use ES properly using queries or the aggregations framework? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft zwhea...@atypon.com javascript: wrote: Hi, I am looking for an efficient way to do inter-document queries in Elasticsearch. Specifically, I want to count the number of users that went through an exit point B after visiting point A. In general terms, say we have some event log data about users actions on a website: {userid:xyz, machineid:110530745, path:/promo/A, country: US, tstamp:2013-04-01 00:01:01} {userid:pdq, machineid:110519774, path:/page/1, country: CN, tstamp:2013-04-01 00:02:11} {userid:xyz, machineid:110530745, path:/promo/D, country: US, tstamp:2013-04-01 00:06:31} {userid:abc, machineid:110527022, path:/page/23, country: DE, tstamp:2013-04-01 00:08:00} {userid:pdq, machineid:110519774, path:/page/2, country: CN, tstamp:2013-04-01 00:08:55} {userid:xyz, machineid:110530745, path:/sale/B, country: US, tstamp:2013-04-01 00:09:46} {userid:abc, machineid:110527022 , path:/promo/A, country: DE, tstamp:2013-04-01 00:10:46} And we have 500+M such entries. We want a count of the number of userids that visited path=/sale/B after visiting path=/promo/A. What I did is to preprocess the data, sorting by userid, tstamp, then compacting all events by the same userid into the same document. Then I wrote a script filter which traverses the path array per document, and returns true if it finds any occurrence of B followed by A. This however is inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This script filter query takes over 300 seconds. Specifically, it can process events at about 400K events per second. BY comparison, I wrote a naive program that does a linear pass of the un-compacted data and that process 11M events per second. By which I conclude that Elasticsearch does not do well on this type of query. I am hoping someone can indicate a more efficient way to do this query in ES. Or else confirm that ES cannot do inter-document queries well. Thanks, Zennet -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c576f27-4b14-4a2d-9415-17ac50e41371%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.