Re: spark version, elasticsearch-hadoop version, akka version sync up
Thank you for the summary - you are confirming (as a sanity check for myself): elasticsearch-hadoop beta3 (not snapshot) on spark core 1.1 only elasticsearch-hadoop-beta3-SNAPSHOT with spark core 1.1, 1.2 and 1.3 -- as long as I don't use Spark SQL when using 1.2 and 1.3 Costin - I am amazed by your ability to keep all this straight - my head would explode dealing with all the dependencies in flux. Kudos to you. On Tuesday, March 17, 2015 at 2:12:06 PM UTC-7, Costin Leau wrote: > > es-hadoop doesn't depend on akka, only on Spark. The scala version that > es-hadoop is compiled against matches the one used by the Spark version > compiled against for each release - typically this shouldn't pose a problem. > > Unfortunately, despite the minor version increments, some of the Spark > APIs or components (in particular Spark SQL) have changed drastically > between each release breaking backwards compatibility. For example, Beta3 > works until Spark 1.1 (which was the latest stable release during its > release) but not with 1.2. This is fixed in master however the current dev > build doesn't work with Spark SQL in the newly released 1.3 (does work with > Spark core). > > This has already been fixed locally however I'm having difficulties trying > to preserve compatibility across the Spark SQL 1.2 release and 1.3. > > Long story short, as long as the dependencies for Spark are in order, the > same should apply for es-hadoop as well since it relies only on Spark (and > Scala of course). > > On Tue, Mar 17, 2015 at 10:43 PM, Jeff Steinmetz > wrote: > >> There are plenty of spark / akka / scala / elasticsearch-hadoop >> dependencies to keep track of. >> >> Is it true that elasticsearch-hadoop needs to be compiled for a specific >> spark version to run correctly on the cluster? I'm also trying to keep >> track of the akka version and scala version. i.e, wil es-hadoop compiled >> for spark 1.2 work with Spark 1.3 ? >> >> When the elasticsearch-hadoop versions are released, as v2.0 v2.1, >> v2.1.0.Beta3, at what point do we need to keep in mind what spark version >> it was also compiled against? >> i.e., is it safe to assume the es-hadoop versions are tied to a specific >> spark core version? >> >> I've been keeping the following chart in my notes to see what all the >> versions are with all dependencies >> = >> >> Akka Version Dependencies >> Current Akka Stable Release: 2.3.9 >> >> Elasticsearch-Hadoop: 2.1.0Beta3 = Spark 1.1.0 >> Elasticsearch-Hadoop: 2.1.0Beta3-SNAPSHOT = Spark 1.2.1 >> Elasticsearch-Hadoop: what about spark 1.3 ? >> >> Spark: 1.3, Akka: 2.3.4-spark >> Spark: 1.2, Akka: 2.3.4-spark >> Spark: 1.1, Akka: 2.2.3-shaded-protobuf >> >> Activator 1.2.12 comes with with Akka 2.3.4 >> >> Play 2.3.8, akka 2.3.4, scala 2.11.1 (will also work with 2.10.4 ) >> Play 2.2.x, akka 2.2.0 >> >> Spark Job Server 0.4.1, Spark Core 1.1.0, Akka, 2.2.4 >> Spark Job Server Master as of Feb 22, 2015, Spark Core 1.2.0, Akka >> 2.3.4, Scala 2.10.4 >> >> Akka persistence latest 2.3.4 or later >> Akka 2.3.9 is released for Scala 2.10.4 and 2.11.5 >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/28ad3f78-8b3d-450a-a29d-06d3e6636cfd%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/28ad3f78-8b3d-450a-a29d-06d3e6636cfd%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0c19d1fa-17b9-4e6e-a698-b49c7d6919d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
spark version, elasticsearch-hadoop version, akka version sync up
There are plenty of spark / akka / scala / elasticsearch-hadoop dependencies to keep track of. Is it true that elasticsearch-hadoop needs to be compiled for a specific spark version to run correctly on the cluster? I'm also trying to keep track of the akka version and scala version. i.e, wil es-hadoop compiled for spark 1.2 work with Spark 1.3 ? When the elasticsearch-hadoop versions are released, as v2.0 v2.1, v2.1.0.Beta3, at what point do we need to keep in mind what spark version it was also compiled against? i.e., is it safe to assume the es-hadoop versions are tied to a specific spark core version? I've been keeping the following chart in my notes to see what all the versions are with all dependencies = Akka Version Dependencies Current Akka Stable Release: 2.3.9 Elasticsearch-Hadoop: 2.1.0Beta3 = Spark 1.1.0 Elasticsearch-Hadoop: 2.1.0Beta3-SNAPSHOT = Spark 1.2.1 Elasticsearch-Hadoop: what about spark 1.3 ? Spark: 1.3, Akka: 2.3.4-spark Spark: 1.2, Akka: 2.3.4-spark Spark: 1.1, Akka: 2.2.3-shaded-protobuf Activator 1.2.12 comes with with Akka 2.3.4 Play 2.3.8, akka 2.3.4, scala 2.11.1 (will also work with 2.10.4 ) Play 2.2.x, akka 2.2.0 Spark Job Server 0.4.1, Spark Core 1.1.0, Akka, 2.2.4 Spark Job Server Master as of Feb 22, 2015, Spark Core 1.2.0, Akka 2.3.4, Scala 2.10.4 Akka persistence latest 2.3.4 or later Akka 2.3.9 is released for Scala 2.10.4 and 2.11.5 -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/28ad3f78-8b3d-450a-a29d-06d3e6636cfd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: function_score weight of 0.0 returns 1
BTW, I am running against Elasticsearch version 1.4.4 On Monday, February 23, 2015 at 6:49:59 PM UTC-8, Jeff Steinmetz wrote: > > Any ideas why this would return a score of 1, when the last (bool > must_not) matches, even though the weight is set to 0.0? > > As a bit of background, this provides custom scoring where > - if one field matches, score at .35, > - if another more important field matches, score .65, > - if both match, score 1.0 (the sume) > - if non match, score 0 > > If I change the no match weight to -1 or something very small, like > 0.01, the weight for the must_not match comes back accurately. > I am thinking this is some kind of divide by 0 error handerl in ES, that > returns 1 when a score hits 0? > > GET test/document/_search > { > "query": { > "function_score": { > "filter": { "term": { "type": "animal" } }, > "functions": [ > { > "filter": { > "bool" : { > "must" : { > "term" : { "content" : "fish"} > }}}, > "weight": 0.35 > }, > { > "filter": { > "bool" : { > "must" : { > "term" : { "user_description" : "fish" } > }}}, > "weight": 0.65 > }, > { > "filter": { > "bool" : { > "must_not" : [ > {"term" : { "user_description" : "fish" }}, > {"term" : { "content" : "fish" }} > ] > } > }, > "weight": 0.0 > } > ], > "score_mode": "sum" > } > } > } > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7edb1ff9-a2f5-4770-9d83-d1afdd5af6ff%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
function_score weight of 0.0 returns 1
Any ideas why this would return a score of 1, when the last (bool must_not) matches, even though the weight is set to 0.0? As a bit of background, this provides custom scoring where - if one field matches, score at .35, - if another more important field matches, score .65, - if both match, score 1.0 (the sume) - if non match, score 0 If I change the no match weight to -1 or something very small, like 0.01, the weight for the must_not match comes back accurately. I am thinking this is some kind of divide by 0 error handerl in ES, that returns 1 when a score hits 0? GET test/document/_search { "query": { "function_score": { "filter": { "term": { "type": "animal" } }, "functions": [ { "filter": { "bool" : { "must" : { "term" : { "content" : "fish"} }}}, "weight": 0.35 }, { "filter": { "bool" : { "must" : { "term" : { "user_description" : "fish" } }}}, "weight": 0.65 }, { "filter": { "bool" : { "must_not" : [ {"term" : { "user_description" : "fish" }}, {"term" : { "content" : "fish" }} ] } }, "weight": 0.0 } ], "score_mode": "sum" } } } -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/de18cc05-c797-45d5-aa4d-b5426e9562d7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Access nested keys in Groovy
Is there a way to easily check for the existence of a key and subsequently access keys in complex object (nested, maybe at times 3 levels deep unfortunately) simplified example doc: { "level1" : { "level2" : { "links" : ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, "http://bit.ly/ghi";] } }, "title" : "test123" } This doesn't seem to work (form what I can tell it never finds the key level1.level2.links even when it does exist) "script" : "if (ctx._source.containsKey('level1.level2.links') ) {ctx._source.links_url_count = ctx._source['level1.level2.links''].size() } else { ctx._source.links_url_count = 0 }" Simple keys work though like ctx._source.containsKey('title') Does it require intermediate checks for every level (which could be a messy script)? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3d0e9aa-ac68-4775-85d2-06a0bf3f3bd4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy
Now that I am into the real wold scenario, it gets a bit tricker - I have nested objects (keys). I have to test the existence of the key in the Groovy script to avoid parsing errors on insert. How do you access a nested object in groovy? and test for the existence of a nested object key? such as this example: curl -XPOST 'http://'$NODE':9200/'$INDEX_NAME'/post' -d '{ "titles": ["title 1", "title 2", "title 3", "title 4"], "raw" : { "links" : ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, "http://bit.ly/ghi";] } }' This doesn't seem to work (form what I can tell it never finds the key raw.links even when it does exist) "script" : "if (ctx._source.containsKey('raw.links') ) {ctx._source.links_url_count = ctx._source['raw.links''].size() } else { ctx._source.links_url_count = 0 }" Simple keys work though like ctx._source.containsKey('title') On Thursday, January 8, 2015 at 9:59:56 PM UTC-8, Nikolas Everett wrote: > > Transform never saves to source. You have to transform on the application > side for that. It was designed for times when you wanted to index something > like this that would just take up extra space in the source document. I > imagine you could use a script field on the query if you need the result to > contain the count. Or just count it on the result side. > > Nik > On Jan 9, 2015 12:43 AM, "Jeff Steinmetz" > wrote: > >> Transform worked well. Nice. >> >> Curious how to get it to save to source? Tried this below, no go. (I >> can however do range queries agains title_count, so transform was indexed >> and works well) >> >> "transform" : { >> "script" : "ctx._source['\'title_count\''] = >> ctx._source['\'titles\''].size()", >> "lang": "groovy" >> }, >> "properties": { >> "titles": { "type": "string", "index": "not_analyzed" }, >> "title_count" : { "type": "integer", "store": "yes" } >>} >> }' >> >> >> On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote: >>> >>> Source is going to be pretty sloe, yeah. If its a one off then its >>> probably fine but if you do it a lot probably best to index the count. >>> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" wrote: >>> >>>> Thank you, that worked. >>>> >>>> I was curious about the speed, is running a script using _source slower >>>> that doc[] ? >>>> >>>> Totally understand a dynamic script is slower regardless of _source vs >>>> doc[]. >>>> >>>> Makes sense that having a count transformed up front during index to >>>> create a materialized value would certainly be much faster. >>>> >>>> >>>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote: >>>>> >>>>> >>>>> >>>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz >>>>> wrote: >>>>> >>>>> Is there a better way to do this? >>>>>> >>>>>> Please see this gist (or even better yet, run the script locally see >>>>>> the issue). >>>>>> >>>>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae >>>>>> >>>>>> You must have scripting enabled in your elasticsearch config for this >>>>>> to work. >>>>>> >>>>>> This was originally based on some comments I found here: >>>>>> http://stackoverflow.com/questions/17314123/search-by-size- >>>>>> of-object-type-field-elastic-search >>>>>> >>>>>> We would like to use a filtered query to only include documents that >>>>>> a small count of items in the list [aka array], filtering where >>>>>> values.size() < 10 >>>>>> >>>>>> "script": "doc['titles'].values.size() < 10" >>>>>> >>>>>> Turns out the values.size() actually either counts tokenized >>>>>> (analyzed) words, or if the mapping turns off analysis, it still counts >>>>&g
Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy
Transform worked well. Nice. Curious how to get it to save to source? Tried this below, no go. (I can however do range queries agains title_count, so transform was indexed and works well) "transform" : { "script" : "ctx._source['\'title_count\''] = ctx._source['\'titles\''].size()", "lang": "groovy" }, "properties": { "titles": { "type": "string", "index": "not_analyzed" }, "title_count" : { "type": "integer", "store": "yes" } } }' On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote: > > Source is going to be pretty sloe, yeah. If its a one off then its > probably fine but if you do it a lot probably best to index the count. > On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" > wrote: > >> Thank you, that worked. >> >> I was curious about the speed, is running a script using _source slower >> that doc[] ? >> >> Totally understand a dynamic script is slower regardless of _source vs >> doc[]. >> >> Makes sense that having a count transformed up front during index to >> create a materialized value would certainly be much faster. >> >> >> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote: >>> >>> >>> >>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz >>> wrote: >>> >>> Is there a better way to do this? >>>> >>>> Please see this gist (or even better yet, run the script locally see >>>> the issue). >>>> >>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae >>>> >>>> You must have scripting enabled in your elasticsearch config for this >>>> to work. >>>> >>>> This was originally based on some comments I found here: >>>> http://stackoverflow.com/questions/17314123/search-by- >>>> size-of-object-type-field-elastic-search >>>> >>>> We would like to use a filtered query to only include documents that a >>>> small count of items in the list [aka array], filtering where >>>> values.size() < 10 >>>> >>>> "script": "doc['titles'].values.size() < 10" >>>> >>>> Turns out the values.size() actually either counts tokenized (analyzed) >>>> words, or if the mapping turns off analysis, it still counts incorrectly >>>> if >>>> there are duplicates. >>>> If analyze is not turned off, it counts tokenized words, not the number >>>> of elements in the list. >>>> If analyze is turned off for a given field, it improves, but duplicates >>>> are missed. >>>> >>>> For example, This comes back as size == 2 >>>> "titles": ["one", "duplicate", "duplicate"] >>>> This comes back as size == 3, should be 4 >>>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, >>>> "http://bit.ly/ghi";] >>>> >>>> Is this a bug, is there a better way, or is this just something that we >>>> don't understand about groovy and values.size()? >>>> >>>> >>>> >>> I think that's just the way doc[] works. Try (but don't actually >>> deploy) _source['titles'].size() < 10. That should do what you expect. >>> Don't deploy that because its too slow. Try indexing the size and >>> filtering on it. You can use a transform to add the size of the array as >>> an integer field and just filter on it using a range filter. That'd >>> probably be the fastest option. >>> >>> Nik >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy
Thank you, that worked. I was curious about the speed, is running a script using _source slower that doc[] ? Totally understand a dynamic script is slower regardless of _source vs doc[]. Makes sense that having a count transformed up front during index to create a materialized value would certainly be much faster. On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote: > > > > On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz > wrote: > > Is there a better way to do this? >> >> Please see this gist (or even better yet, run the script locally see the >> issue). >> >> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae >> >> You must have scripting enabled in your elasticsearch config for this to >> work. >> >> This was originally based on some comments I found here: >> >> http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search >> >> We would like to use a filtered query to only include documents that a >> small count of items in the list [aka array], filtering where >> values.size() < 10 >> >> "script": "doc['titles'].values.size() < 10" >> >> Turns out the values.size() actually either counts tokenized (analyzed) >> words, or if the mapping turns off analysis, it still counts incorrectly if >> there are duplicates. >> If analyze is not turned off, it counts tokenized words, not the number >> of elements in the list. >> If analyze is turned off for a given field, it improves, but duplicates >> are missed. >> >> For example, This comes back as size == 2 >> "titles": ["one", "duplicate", "duplicate"] >> This comes back as size == 3, should be 4 >> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, >> "http://bit.ly/ghi";] >> >> Is this a bug, is there a better way, or is this just something that we >> don't understand about groovy and values.size()? >> >> >> > I think that's just the way doc[] works. Try (but don't actually deploy) > _source['titles'].size() < 10. That should do what you expect. Don't > deploy that because its too slow. Try indexing the size and filtering on > it. You can use a transform to add the size of the array as an integer > field and just filter on it using a range filter. That'd probably be the > fastest option. > > Nik > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
counting items in a list [array] returns (what we think) are incorrect counts via groovy
Is there a better way to do this? Please see this gist (or even better yet, run the script locally see the issue). https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae You must have scripting enabled in your elasticsearch config for this to work. This was originally based on some comments I found here: http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search We would like to use a filtered query to only include documents that a small count of items in the list [aka array], filtering where values.size() < 10 "script": "doc['titles'].values.size() < 10" Turns out the values.size() actually either counts tokenized (analyzed) words, or if the mapping turns off analysis, it still counts incorrectly if there are duplicates. If analyze is not turned off, it counts tokenized words, not the number of elements in the list. If analyze is turned off for a given field, it improves, but duplicates are missed. For example, This comes back as size == 2 "titles": ["one", "duplicate", "duplicate"] This comes back as size == 3, should be 4 "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, "http://bit.ly/ghi";] Is this a bug, is there a better way, or is this just something that we don't understand about groovy and values.size()? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5e88338-8c4f-4cb8-b6c4-d7f47b365175%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch spark esRDD not returing the aggregate values in aggregated query
Siva, Try the latest build of elasticsearch-hadoop, ver 2.1.0 Beta 2 http://www.elasticsearch.org/overview/hadoop/download/ The esRDD has been changed to sparks PairRDD https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions The RDD will now be key/value (tuples) that look like (String, Map[String, ANY]) so you could start to walk the json key/value hierarchy with something like: esRDD.flatMap { args => args._2.get("aggregations") } (the syntax above is not exact, since your specific query result may have a different first key/value pair as the first object ) Best, Jeff Steinmetz Director of Data Science Ekho, Inc. www.ekho.me @jeffsteinmetz On Wednesday, September 17, 2014 6:13:37 AM UTC-7, siva pradeep wrote: > > Hi, > > I have a query which filters the rows and then applies the aggregation. I > tried running the query in "Sense" it gave me the expected result. But when > I try to run the same query using elasticsearch-spark_2.10 I get the rows > filtered by the query but not the aggregation result. I am sure I am > missing some thing but unable to figure out that. > > Here is the query > > GET _search > { > "query" : { > "bool": { > "must": [ > { > "filtered": { > "query": { > "range": { > "@timestamp": { > "from": "2014-09-03T01:40:37.437Z", > "to": "2014-09-03T01:45:11.437Z" > } > } > } > } > } > ] > } > }, > > "size": 0, > > "fields": ["cid","entity"], > "aggs": { > "cid": { > "terms": { > "field": "cid", > "min_doc_count": 2, > "size": 100 > }, > > "aggs": { > "tn": { > "terms": { > "field": "entity" > } > } > } > } > } > } > > > Query Result: > > { >"took": 10005, >"timed_out": false, >"_shards": { > "total": 10, > "successful": 10, > "failed": 0 >}, >"hits": { > "total": 2430, > "max_score": 0, > "hits": [] >}, >"aggregations": { > "cid": { > "buckets": [ > { >"key": " 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168 > ", >"doc_count": 2, >"tn": { > "buckets": [ > { > "key": "15052563268", > "doc_count": 2 > } > ] >} > } > ] > } >} > } > > > Spark program : > > object PresenceFilter extends App { > > val query: String = "{\n\n \"query\" : {\n\n\"bool\": {\n\n > \"must\": [\n\n{\n\n \"filtered\": {\n\n > \"query\": {\n\n \"range\": {\n\n > \"@timestamp\": {\n\n \"from\": > \"2014-09-03T01:40:37.437Z\",\n\n \"to\": > \"2014-09-03T01:45:11.437Z\"\n\n}\n\n > }\n\n}\n\n }\n\n}\n\n ]\n\n}\n\n > },\n \n \"size\": 0,\n \n \"fields\": [\"cid\",\"entity\"],\n\n > \"aggs\": {\n\n\"cid\": {\n\n \"terms\": {\n\n\"field\": > \"cid\",\n\n\"min_doc_count\": 2,\n\n\"size\": 100\n\n > },\n \n \"aggs\": {\n\n\"tn\": {\n\n \"terms\": > {\n\n\"field\": \"entity\"\n\n }\n\n}\n\n > }\n\n}\n }\n\n}" > > val sparkConf = new SparkConf() > .setAppName("PresenceAnalysis") > .setMaster("local[4]") > .set("es.nodes", "prs-wch-10.sys.comcast.net") > .set("es.port", "9200") > .set
Re: org.elasticsearch.search.aggregations docs
Thank you. Although I was specifically talking about documentation for the Java search API. For example, there is this http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/java-facets.html But ... haven't found anything that covers the Aggregations replacement. On Monday, August 11, 2014 4:08:41 PM UTC-7, Isabel Drost-Fromm wrote: > > On Mon, Aug 11, 2014 at 10:47 PM, Jeff Steinmetz > wrote: > >> I've been looking all over the place for documentation on >> >> org.elasticsearch.search.aggregations._ >> > > There's quite a bit of information in the online docs: > > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html > > Hope this helps you. > > > Isabel > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6551218f-ce51-4fa5-9334-a7952327929c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Creating filters per aggregation similar to Facets
Kibana provides a good example of date histograms, split out by each "query" entered at the top in the "Query" bar. It essentially creates multiple free text queries against "all". I see it generates per facet filter, with a free text (query_string) search. Since facets are to be depreciated, I am now only using aggregations (in a custom application - unrelated to Kibana). I have tried this with aggregations without success. I also realize there is something new coming in 1.4, but I assume with multiple aggregations, (vs. multiple filters to create multiple buckets) I can do this today. Here is a oversimplified version of the date histogram aggregation I have (without the leading query section - consider it pseudo code) The "filter" section is the part in question. Removing the filter works, I have tried all types of "filter" formats, looked for samples, etc. no luck. I have tried {"all" : "search term"} as well as: {"query_string": { "all" : "search terrm" }} I've tried a specific field name, etc. All attempts are not proving fruitful. Pseudo example using aggregations: "aggregations" : { "0" : { "date_histogram" : { "filter" : { "query_string" : { "query" : "Intel" } }, "field" : "created_at", "interval" : "1d", "min_doc_count" : 0 } }, "1" : { "date_histogram" : { "filter" : { "query_string" : { "query" : "Samsung" } }, "field" : "created_at", "interval" : "1d", "min_doc_count" : 0, "pre_zone" : "-02:00", "post_zone" : "-03:30" } } } Here is the Facet version (which works - note Filtered/query/query_string/query): --- { "facets": { "0": { "date_histogram": { "field": "created_at", "interval": "3h" }, "global": true, "facet_filter": { "fquery": { "query": { "filtered": { "query": { "query_string": { "query": "Intel" } }, "filter": { "bool": { "must": [ { "terms": { "userid": [ "53d02d6aed9597f3c6fa" ] } }, { "range": { "created_at": { "from": "now-30d", "to": "now" } } } ] } } } } } } }, "1": { "date_histogram": { "field": "created_at", "interval": "3h" }, "global": true, "facet_filter": { "fquery": { "query": { "filtered": { "query": { "query_string": { "query": "Samsung" } }, "filter": { "bool": { "must": [ { "terms": { "userid": [ "53d02d6aed9597f3c6fa" ] } }, { "range": { "created_at": { "from": "now-30d", "to": "now" } } } ] } } } } } }
org.elasticsearch.search.aggregations docs
I've been looking all over the place for documentation on org.elasticsearch.search.aggregations._ The Java API 1.x docs state Facets will be depreciated. I am using the source code on github for reference. I also see that Kibana does all its queries using Facets. I wanted to make sure I wasn't missing something. Searched everywhere. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4023b71f-1b17-4aec-ad3c-b042d2b93dbb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.