subject:"aggregations"

Sum aggregation with results from other aggregations?

2015-06-01 Thread Josh Harrison

Is it possible to create an aggregation where I can do a sum on the results 
of a sub bucket?

I'm working on twitter data. In this data I have a bunch of retweets of 
different users.
Say that user A has 10 tweets that are retweeted a hundred times in my 
dataset. I want to find the maximum retweet_count for each individual 
tweet, and then I want to find the sum of all of those maximums from an 
individual user.
This is the base query structure I'm working with: 

{
  "aggs": {
"user_id": {
  "terms": {
"field": "retweet_user_id"
  },
  "aggs": {
"tweet_ids": {
  "terms": {
"field": "retweet_id",
"order": "max_tweet.value"
  },
  "aggs": {
"max_tweet": {
  "max": {
"field": "retweet_count"
  }
}
  }
}
  }
}
  }
}



Importantly here, I don't want to just take a sum of "retweet_count" for a 
given retweet_user_id - this doesn't give the max value per tweet.


Essentially, is it possible for me to take a sum of the agg results at 
user_id.tweet_ids.max_tweet.value, and use that as an "order" term in the 
user_id terms agg?



-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/100b463d-0f95-4801-aec7-e32544624518%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How can I filter these aggregations properly?

2015-05-17 Thread Nathan Pearson

I'm trying to build a facetted navigation (similar to zappos or amazon).

I posted a detailed summary on stackoverflow here:

http://stackoverflow.com/questions/30291997/how-can-i-filter-these-elasticsearch-aggregations

If you guys prefer, I can do a write up here?

Thanks in advance for any help you can offer.

-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a640f210-f869-471f-a141-6633d25456e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic search aggregations buckets counting email format as two different bucket key .

2015-04-15 Thread Glen Smith

There is a bucket for each indexed term for the selected field, not for 
each stored value.

Whatever tokenizer you are using on that field is dividing the text at the 
"@" symbol.

If you want buckets for the exact value of the field, you need a 
not_analyzed field.

On Wednesday, April 15, 2015 at 8:15:52 PM UTC-4, Rajesh Sindhu wrote:
>
> I have field stored as "us...@user.com  " .
>
> Using aggregations json query :
>
> "aggregations": {
> "email-terms": {
> "terms": {
> "field": "l_obj.email",
> "size": 0,
> "shard_size": 0,
> "order": {
> "_count": "desc"
> }
> }
> }
> }
>
>
> I am getting response :
>
> "buckets" : [
> {
> "key" : "user.com",
> "doc_count" : 1
> },
> {
> "key" : "user1",
> "doc_count" : 1
> }
>
> instead of
>
> "buckets" : [
> {
> "key" : "us...@user.com ",
> "doc_count" : 1
> }
> ]
>
> Same issue persists for string type likes : user1.user2.user.com ,I am 
> doing terms aggregations . Am i missing something here ?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c3535f6-7fa0-4388-8a81-4ddd26015e92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elastic search aggregations buckets counting email format as two different bucket key .

2015-04-15 Thread Rajesh Sindhu



I have field stored as "us...@user.com " .

Using aggregations json query :

"aggregations": {
"email-terms": {
"terms": {
"field": "l_obj.email",
"size": 0,
"shard_size": 0,
"order": {
"_count": "desc"
}
}
}
}


I am getting response :

"buckets" : [
{
"key" : "user.com",
"doc_count" : 1
},
{
"key" : "user1",
"doc_count" : 1
}

instead of

"buckets" : [
{
"key" : "us...@user.com",
"doc_count" : 1
}
]

Same issue persists for string type likes : user1.user2.user.com ,I am 
doing terms aggregations . Am i missing something here ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c70c288c-f8f2-4239-a8e9-55640c6945e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can I search a query based on a results from aggregations?

2015-04-15 Thread Lincoln Xiong

Hi good people,

I am using elasticsearch for some logs monitoring and analysis. Sometimes I 
need to use aggregation to return distinct values of a field to research 
into some issue. I was always wondering if I can run a query based on the 
results from aggregation. For now, I used python api to achieve what I 
want, by store the results of aggregation in a list and make a new query 
search based on the value in the list.

And I was hinted by someone said I can do some research with nested type or 
parent/child type. But seems all my data is parsing from plaintext and 
structured by grok in Logstash, I don't think I am able to build nested 
type.

So you can image that some documents in my case will have the same value 
for a specific field. If there is a way to put these documents with the 
same value in a field together (even if I really need to reindex) I will 
try to implement it and test the performance. But is that possible in es?

Cheers,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/23aed5b8-c908-422a-9701-c198f31cb65e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Incorrect Aggregations returned from ES

2015-04-14 Thread Adrien Grand

Nils: It looks different from your issue since document counts are correct
here?

MC: I think it is due to
https://github.com/elastic/elasticsearch/issues/8688. Has your `max_val`
field been dynamically mapped? The only way to prevent this issue is to map
fields explicitely instead of relying on dynamic mappings.

On Tue, Apr 14, 2015 at 3:55 PM, Nils Dijk  wrote:

> Hi,
>
> To me this sounds a lot like an issue that was happening to me a week
> before the release of 1.0.0. This issue was related to internal memory
> reuse within Elasticsearch before the result was read out. The issue is
> documented here: https://github.com/elastic/elasticsearch/issues/5021
>
> What I did back then was create a reproducible test that showed the issue.
>
> I doubt it has to do with your replica's being inconsistent. Especially
> since you turned off replicating and then turned it back on, this copies
> the files you have in your primary to the secondaries.
>
> -- Nils
> Here is the test I created in the past:
> https://gist.github.com/thanodnl/8803745
>
> On Wednesday, March 25, 2015 at 11:57:15 PM UTC+1, MC wrote:
>>
>> I am seeing some erroneous behavior in my ES cluster when performing
>> aggregations.  Originally, I thought this was specific to a histogram as
>> that is where the error first appeared (in a K3 graph - see my post
>> https://groups.google.com/forum/#!topic/elasticsearch/iY-lKjtW7PM for
>> reference) but I have been able to re-create the exception with a simple
>> max aggregation.  The details are as follows:
>>
>> ES Version: 1.4.4
>> Topology: 5 nodes, 5 shards per index, 2 replicas
>> OS: Redhat Linux
>>
>> To create the issue I execute the following query against the cluster:
>>
>> {
>>   "query": {
>> "term": {
>>   "metric": "used"
>> }
>>   },
>>   "aggs": {
>> "max_val": {
>>   "max": {
>> "field": "metric_value"
>>   }
>> }
>>   }
>> }
>>
>> Upon executing this query multiple times, I get different responses.  One
>> time I get the expected result:
>> ...
>> "took": 13,
>> "timed_out": false,
>> "_shards": {
>> "total": 5,
>> "successful": 5,
>> "failed": 0
>> },
>> "hits": {
>> "total": 11712,
>> "max_score": 9.361205,
>> ...
>> "aggregations": { "max_val": { "value": 18096380}}
>>
>> whereas on another request with the same query I get the following bad
>> response:
>>
>> "took": 8,
>> "timed_out": false,
>> "_shards": {
>> "total": 5,
>> "successful": 5,
>> "failed": 0
>> },
>> "hits": {
>> "total": 11712,
>> "max_score": 9.361205,
>> ...
>> "aggregations": { "max_val": { "value": 4697741490703565000}}
>>
>>
>>
>> Some possibly relevant observations:
>> 1.  In my first set of tests, I was consistently getting the correct
>> results for the first 2 requests and the bad result on the 3rd request
>> (with no one else executing this query at that point in time)
>> 2.  Flushing the cache did not correct the issue
>> 3.  I reduced the number of replicas to 0 and was consistently getting
>> the same result (which happened to be the correct one)
>> 4.  After increasing the replica count back to 2 and waiting until ES
>> reported that the replication was complete, I tried the same experiment.
>> This time, the 1st request retrieved the correct result and the next 2
>> requests retrieved incorrect results.  In this case the incorrect results
>> were not the same but were both huge and of the same order of magnitude.
>>
>>
>> Other info:
>> - The size of the index was about 3.3Gb with ~ 50M documents in it
>> - This is one of many date based indices (i.e. similar to the logstash
>> index setup), but the only one in this installation that exhibited the
>> issue.  I believe we saw something similar in a UAT environment as well
>> where 1 or 2 of the indices acted in this weird manner
>> - ES reported the entire cluster as green
>>
>>
>> It seems that some shard(s)/replica(s) were being corrupted on the
>> replication and we were being routed to that one every 3rd hit.  (Is this
>> somehow correlated to the number of replicas?)
>>
>> So, my questions

Re: Incorrect Aggregations returned from ES

2015-04-14 Thread Nils Dijk

Hi,

To me this sounds a lot like an issue that was happening to me a week 
before the release of 1.0.0. This issue was related to internal memory 
reuse within Elasticsearch before the result was read out. The issue is 
documented here: https://github.com/elastic/elasticsearch/issues/5021

What I did back then was create a reproducible test that showed the issue.

I doubt it has to do with your replica's being inconsistent. Especially 
since you turned off replicating and then turned it back on, this copies 
the files you have in your primary to the secondaries. 

-- Nils
Here is the test I created in the past: 
https://gist.github.com/thanodnl/8803745

On Wednesday, March 25, 2015 at 11:57:15 PM UTC+1, MC wrote:
>
> I am seeing some erroneous behavior in my ES cluster when performing 
> aggregations.  Originally, I thought this was specific to a histogram as 
> that is where the error first appeared (in a K3 graph - see my post 
> https://groups.google.com/forum/#!topic/elasticsearch/iY-lKjtW7PM for 
> reference) but I have been able to re-create the exception with a simple 
> max aggregation.  The details are as follows:
>
> ES Version: 1.4.4
> Topology: 5 nodes, 5 shards per index, 2 replicas
> OS: Redhat Linux
>
> To create the issue I execute the following query against the cluster:
>
> {
>   "query": {
> "term": {
>   "metric": "used"
> }
>   },
>   "aggs": {
> "max_val": {
>   "max": {
> "field": "metric_value"
>   }
> }
>   }
> }
>
> Upon executing this query multiple times, I get different responses.  One 
> time I get the expected result:
> ...
> "took": 13,
> "timed_out": false,
> "_shards": {
> "total": 5,
> "successful": 5,
> "failed": 0
> },
> "hits": {
> "total": 11712,
> "max_score": 9.361205,
> ...
> "aggregations": { "max_val": { "value": 18096380}}
>
> whereas on another request with the same query I get the following bad 
> response:
>
> "took": 8,
> "timed_out": false,
> "_shards": {
> "total": 5,
> "successful": 5,
> "failed": 0
> },
> "hits": {
> "total": 11712,
> "max_score": 9.361205,
> ...
> "aggregations": { "max_val": { "value": 4697741490703565000}}
>
>
>
> Some possibly relevant observations:
> 1.  In my first set of tests, I was consistently getting the correct 
> results for the first 2 requests and the bad result on the 3rd request 
> (with no one else executing this query at that point in time)
> 2.  Flushing the cache did not correct the issue
> 3.  I reduced the number of replicas to 0 and was consistently getting the 
> same result (which happened to be the correct one)
> 4.  After increasing the replica count back to 2 and waiting until ES 
> reported that the replication was complete, I tried the same experiment.  
> This time, the 1st request retrieved the correct result and the next 2 
> requests retrieved incorrect results.  In this case the incorrect results 
> were not the same but were both huge and of the same order of magnitude.
>
>
> Other info:
> - The size of the index was about 3.3Gb with ~ 50M documents in it
> - This is one of many date based indices (i.e. similar to the logstash 
> index setup), but the only one in this installation that exhibited the 
> issue.  I believe we saw something similar in a UAT environment as well 
> where 1 or 2 of the indices acted in this weird manner
> - ES reported the entire cluster as green
>
>
> It seems that some shard(s)/replica(s) were being corrupted on the 
> replication and we were being routed to that one every 3rd hit.  (Is this 
> somehow correlated to the number of replicas?)
>
> So, my questions are:
>
> 1. Has anyone seen this type of behavior before?  
> 2. Can it somehow be data dependent?
> 3. Is there any way to figure out what happened/what is happening?
> 4. Why does ES report the cluster state as green?
> 5. How can I debug this?
> 6. How can I prevent/correct this?
>
>
> Any and all help/pointers would be greatly appreciated.
>
> Thanks in advance,
> MC
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14c3bff7-b17b-4fa8-938f-cf8e13c80a29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nested Aggregations returns wrong counts. Any help appreciated.

2015-04-13 Thread thanuja

I have a catalog of products that I want to calculate aggregates on. The 
trouble comes with trying to do nested aggregations with filter that has 
both nested and parent fields in it. Either it gives wrong counts or 0 
hits. Here is a sample of my product object mapping:

"Products": {
"properties": {
   "ProductID": {
  "type": "long"
   },
   "ProductType": {
  "type": "long"
   },
   "ProductName": {
  "type": "string",
  "fields": {
 "raw": {
"type": "string",
"index": "not_analyzed"
 }
  }
   },
   "Prices": {
  "type": "nested",
  "properties": {
 "CurrencyType": {
"type": "integer"
 },
 "Cost": {
"type": "double"
 }
  }
  } 
  }
  }
Here is an example of the sql query that I am trying to replicate in 
elastic:

SELECT PRODPR.Cost AS PRODPR_Cost 
,COUNT(PROD.ProdcutID) AS PROD_ProductID_Count
FROM Products PROD WITH (NOLOCK)
LEFT OUTER JOIN Prices PRODPR WITH (NOLOCK) ON (PRODPR.objectid = 
PROD.objectid)
WHERE PRODPR.CurrencyType = 4
AND PROD.ProductType IN (
11273
,11293
,11294
)
GROUP BY PRODPR.Cost

Elastic Search queries I came up with:

*First One (following query returns correct counts with just CurrencyType 
as filter but when I add ProductType filter, it gives me wrong counts)*

GET /IndexName/Products/_search
{
  "aggs": {
"price_agg": {
  "filter": {
"bool": {
  **"must": [
{
  "nested": {
"path": "Prices",
"filter": {
  "term": {
"Prices.CurrencyType": "8"
  }
}
  }
},
{
  "terms": {
"ProductType": [ Parent 
field
  "11273",
  "11293",
  "11294"
]
  }
}
  ]
}
  },
  "aggs": {
"price_nested_agg": {
  "nested": {
"path": "Prices"
  },
  "aggs": {
"59316518_group_agg": {
  "terms": {
"field": "Prices.Cost",
"size": 0
  },
  "aggs": {
"product_count": {
"reverse_nested": { },
"aggs": {
"ProductID_count_agg": {
"value_count": {
"field": "ProductID"
}
}
}
}
  }
}
  }
}
  }
}
  },
  "size": 0
}
*  Second One (following query returns correct counts with just 
CurrencyType as filter but when I add ProductType filter, it gives me 0 
hits):*

GET /IndexName/Prodcuts/_search
{
  "aggs": {
"price_agg": {
  "nested": {
"path": "Prices"
  },
  "aggs": {
"currency_filter": {
  "filter": {
  "bool": {
  "must": [
 {
 "term": {
"Prices.CurrrencyType": "4"
 }
 },
 {
 "terms": {
"ProductType": [ Parent 
field
   "11273",
   "11293"
]
 }
 }
  ]
  }
  },
  "aggs": {
"59316518_group_agg": {
  "terms": {
"field": "Prices.Cost",
"size": 0
  },
  "aggs": {
"product_count": {
  "reverse_nested": {},
  "aggs": {
"ProductID_count_agg": {
  "value_count": {
"field": "ProductID"
  }
}
  }
}
  }
}
  }
}
  }
}
  },
  "size": 0
}
I have tried some more queries but the above two are the closest I came up 
with. Has anyone come across this use case? What am I doing wrong? Any help 
is appreciated. Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1905997-8ddf-44f0-9623-f272ef33c164%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to get aggregations working in Elasticsearch Spark adapter ?

2015-04-08 Thread Costin Leau

Facets are deprecated and will be removed and as such, there is no support or plans to add support for them in the near 
future.


As for when aggregations will land in 2.1, the near future - I don't want to give estimates (only to miss them) but 
let's just say it's very high priority.


Cheers,

On 4/8/15 11:32 PM, michele crudele wrote:

Anyone having an answer for this ? Thanks in advance.

Il giorno mercoledì 1 aprile 2015 17:58:19 UTC+2, michele crudele ha scritto:

Thanks,

when is the 2.1 release coming?

Another question, which I think is related to this one btw... I was able to 
run this piece of code using facets:

val q5 = """
|{
|  "query": {
|"match": { "_all": "error"}},
|  "facets":{
|"appName": {"terms": {"field": "appName"}},
|"sourceName": {"terms": {"field": "sourceName"}}}
|}
""".stripMargin


println("Query: " + q5)

val rdd = sc.esRDD("logs/app", q5);

What I get from the rdd are tuples (docID, Map[of the field=value]). Should 
I also expect to find facets ? If so,
how do I get them ?


Il giorno mercoledì 1 aprile 2015 12:02:20 UTC+2, Costin Leau ha scritto:

The short answer is that the connector relies on scan/scroll search for 
its core functionality. And with aggs it
needs
to switch the way it queries the cluster to a count search.
This is the last major feature that needs to be addressed before the 
2.1 release. There's also an issue for it
raised
here [1] which you can track.

Cheers,

[1] https://github.com/elastic/elasticsearch-hadoop/issues/276
<https://github.com/elastic/elasticsearch-hadoop/issues/276>

On 4/1/15 12:53 PM, michele crudele wrote:
>
> I have ES, Spark, and ES hadoop adapter installed on my laptop. I 
wrote a simple scala notebook to test ES adapter.
> Everything was fine until I started thinking at more sophisticated 
features. This is the snippet that drives me crazy:
>
> %AddJar 
file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar
> %AddJar 
file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar
>
> import org.elasticsearch.spark.rdd._
>
> val q2 = """{
>  |"query" : { "term": { "appName": "console" } },
>  |"aggregations": {
>  |  "unusual": {
>  |"significant_terms": {"field": "pathname"}
>  |  }
>  |}
> |}""".stripMargin
>
> val res = sc.esRDD("logs/app", q2);
>
> println("Matches: " + res.count())
>
>
> When I run the code I get this exception:
>
> Name: org.apache.spark.SparkException
    > Message: Job aborted due to stage failure: Task 2 in stage 15.0 
failed 1 times, most recent failure: Lost task 2.0 in stage 15.0 (TID 58, 
localhost): org.apache.spark.util.TaskCompletionListenerException: 
SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards
failed; shardFailures {[N1R-UlgOQCGXCFCtbJ3sBQ][logrecords][2]:
ElasticsearchIllegalArgumentException[aggregations are not supported 
with search_type=scan]}]
> at 
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:76)
> at org.apache.spark.scheduler.Task.run(Task.scala:58)
> at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
> at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> "aggregations are not supported with search_type=scan", which is fine.
> The question is: how do I set search_type to the right value (e.g. 
count) in the sc.esRDD() call?
> I tried several places in the q2 json with no success and I was not 
able to find an answer through
> the documentation. I would appreciate any help.
    >
    > However, I see a possible inconsistency with the behaviour of the ES 
API used directly via cURL.
> The command with the same query above, and without any setting about 
search_type works correctly:
>
> curl 'loca

Re: How to get aggregations working in Elasticsearch Spark adapter ?

2015-04-08 Thread michele crudele

Anyone having an answer for this ? Thanks in advance.

Il giorno mercoledì 1 aprile 2015 17:58:19 UTC+2, michele crudele ha 
scritto:
>
> Thanks,
>
> when is the 2.1 release coming?
>
> Another question, which I think is related to this one btw... I was able 
> to run this piece of code using facets:
>
> val q5 = """
> |{
> |  "query": {
> |"match": { "_all": "error"}},
> |  "facets":{
> |"appName": {"terms": {"field": "appName"}},
> |"sourceName": {"terms": {"field": "sourceName"}}}
> |}
> """.stripMargin
>
>
> println("Query: " + q5)
>
> val rdd = sc.esRDD("logs/app", q5);
>
> What I get from the rdd are tuples (docID, Map[of the field=value]). 
> Should I also expect to find facets ? If so, how do I get them ?
>
>
> Il giorno mercoledì 1 aprile 2015 12:02:20 UTC+2, Costin Leau ha scritto:
>>
>> The short answer is that the connector relies on scan/scroll search for 
>> its core functionality. And with aggs it needs 
>> to switch the way it queries the cluster to a count search. 
>> This is the last major feature that needs to be addressed before the 2.1 
>> release. There's also an issue for it raised 
>> here [1] which you can track. 
>>
>> Cheers, 
>>
>> [1] https://github.com/elastic/elasticsearch-hadoop/issues/276 
>>
>> On 4/1/15 12:53 PM, michele crudele wrote: 
>> > 
>> > I have ES, Spark, and ES hadoop adapter installed on my laptop. I wrote 
>> a simple scala notebook to test ES adapter. 
>> > Everything was fine until I started thinking at more sophisticated 
>> features. This is the snippet that drives me crazy: 
>> > 
>> > %AddJar 
>> file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar
>>  
>>
>> > %AddJar 
>> file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar
>>  
>>
>> > 
>> > import org.elasticsearch.spark.rdd._ 
>> > 
>> > val q2 = """{ 
>> >  |"query" : { "term": { "appName": "console" } }, 
>> >  |"aggregations": { 
>> >  |  "unusual": { 
>> >  |"significant_terms": {"field": "pathname"} 
>> >  |  } 
>> >  |} 
>> > |}""".stripMargin 
>> > 
>> > val res = sc.esRDD("logs/app", q2); 
>> > 
>> > println("Matches: " + res.count()) 
>> > 
>> > 
>> > When I run the code I get this exception: 
>> > 
>> > Name: org.apache.spark.SparkException 
>> > Message: Job aborted due to stage failure: Task 2 in stage 15.0 failed 
>> 1 times, most recent failure: Lost task 2.0 in stage 15.0 (TID 58, 
>> localhost): org.apache.spark.util.TaskCompletionListenerException: 
>> SearchPhaseExecutionException[Failed to execute phase [init_scan], all 
>> shards failed; shardFailures {[N1R-UlgOQCGXCFCtbJ3sBQ][logrecords][2]: 
>> ElasticsearchIllegalArgumentException[aggregations are not supported with 
>> search_type=scan]}] 
>> > at 
>> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:76) 
>>
>> > at org.apache.spark.scheduler.Task.run(Task.scala:58) 
>> > at 
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) 
>> > at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  
>>
>> > at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  
>>
>> > at java.lang.Thread.run(Thread.java:745) 
>> > 
>> > 
>> > "aggregations are not supported with search_type=scan", which is fine. 
>> > The question is: how do I set search_type to the right value (e.g. 
>> count) in the sc.esRDD() call? 
>> > I tried several places in the q2 json with no success and I was not 
>> able to find an answer through 
>> > the documentation. I would appreciate any help. 
>> > 
>> > However, I see a possible inconsistency with the behaviour of the ES 
>> API used directly via cURL. 
>> > The command with the same query above, and without any setting about 
>> search_type works correctly: 
>> > 
>> > curl 'localho

Re: How to get aggregations working in Elasticsearch Spark adapter ?

2015-04-01 Thread michele crudele

Thanks,

when is the 2.1 release coming?

Another question, which I think is related to this one btw... I was able to 
run this piece of code using facets:

val q5 = """
|{
|  "query": {
|"match": { "_all": "error"}},
|  "facets":{
|"appName": {"terms": {"field": "appName"}},
|"sourceName": {"terms": {"field": "sourceName"}}}
|}
""".stripMargin


println("Query: " + q5)

val rdd = sc.esRDD("logs/app", q5);

What I get from the rdd are tuples (docID, Map[of the field=value]). Should 
I also expect to find facets ? If so, how do I get them ?


Il giorno mercoledì 1 aprile 2015 12:02:20 UTC+2, Costin Leau ha scritto:
>
> The short answer is that the connector relies on scan/scroll search for 
> its core functionality. And with aggs it needs 
> to switch the way it queries the cluster to a count search. 
> This is the last major feature that needs to be addressed before the 2.1 
> release. There's also an issue for it raised 
> here [1] which you can track. 
>
> Cheers, 
>
> [1] https://github.com/elastic/elasticsearch-hadoop/issues/276 
>
> On 4/1/15 12:53 PM, michele crudele wrote: 
> > 
> > I have ES, Spark, and ES hadoop adapter installed on my laptop. I wrote 
> a simple scala notebook to test ES adapter. 
> > Everything was fine until I started thinking at more sophisticated 
> features. This is the snippet that drives me crazy: 
> > 
> > %AddJar 
> file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar
>  
>
> > %AddJar 
> file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar
>  
>
> > 
> > import org.elasticsearch.spark.rdd._ 
> > 
> > val q2 = """{ 
> >  |"query" : { "term": { "appName": "console" } }, 
> >  |"aggregations": { 
> >  |  "unusual": { 
> >  |"significant_terms": {"field": "pathname"} 
> >  |  } 
> >  |} 
> > |}""".stripMargin 
> > 
> > val res = sc.esRDD("logs/app", q2); 
> > 
> > println("Matches: " + res.count()) 
> > 
> > 
> > When I run the code I get this exception: 
> > 
> > Name: org.apache.spark.SparkException 
> > Message: Job aborted due to stage failure: Task 2 in stage 15.0 failed 1 
> times, most recent failure: Lost task 2.0 in stage 15.0 (TID 58, 
> localhost): org.apache.spark.util.TaskCompletionListenerException: 
> SearchPhaseExecutionException[Failed to execute phase [init_scan], all 
> shards failed; shardFailures {[N1R-UlgOQCGXCFCtbJ3sBQ][logrecords][2]: 
> ElasticsearchIllegalArgumentException[aggregations are not supported with 
> search_type=scan]}] 
> > at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:76) 
>
> > at org.apache.spark.scheduler.Task.run(Task.scala:58) 
> > at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) 
> > at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
>
> > at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
>
> > at java.lang.Thread.run(Thread.java:745) 
> > 
> > 
> > "aggregations are not supported with search_type=scan", which is fine. 
> > The question is: how do I set search_type to the right value (e.g. 
> count) in the sc.esRDD() call? 
> > I tried several places in the q2 json with no success and I was not able 
> to find an answer through 
> > the documentation. I would appreciate any help. 
> > 
> > However, I see a possible inconsistency with the behaviour of the ES API 
> used directly via cURL. 
> > The command with the same query above, and without any setting about 
> search_type works correctly: 
> > 
> > curl 'localhost:9200/logs/app/_search?pretty' -d'{"query" : { "term": { 
> "appName": "console" } }, 
> > "aggregations": { "unusual": { "significant_terms": {"field": 
> "pathname"} }}}' 
> > 
> > returns hits:{} and aggregations:{}. Why the Spark integration does not 
> work the same ? 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this

Re: How to get aggregations working in Elasticsearch Spark adapter ?

2015-04-01 Thread Costin Leau

The short answer is that the connector relies on scan/scroll search for its core functionality. And with aggs it needs 
to switch the way it queries the cluster to a count search.
This is the last major feature that needs to be addressed before the 2.1 release. There's also an issue for it raised 
here [1] which you can track.


Cheers,

[1] https://github.com/elastic/elasticsearch-hadoop/issues/276

On 4/1/15 12:53 PM, michele crudele wrote:


I have ES, Spark, and ES hadoop adapter installed on my laptop. I wrote a 
simple scala notebook to test ES adapter.
Everything was fine until I started thinking at more sophisticated features. 
This is the snippet that drives me crazy:

%AddJar 
file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar
%AddJar 
file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar

import org.elasticsearch.spark.rdd._

val q2 = """{
 |"query" : { "term": { "appName": "console" } },
 |"aggregations": {
 |  "unusual": {
 |"significant_terms": {"field": "pathname"}
 |  }
 |}
|}""".stripMargin

val res = sc.esRDD("logs/app", q2);

println("Matches: " + res.count())


When I run the code I get this exception:

Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Task 2 in stage 15.0 failed 1 times, 
most recent failure: Lost task 2.0 in stage 15.0 (TID 58, localhost): 
org.apache.spark.util.TaskCompletionListenerException: 
SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards 
failed; shardFailures {[N1R-UlgOQCGXCFCtbJ3sBQ][logrecords][2]: 
ElasticsearchIllegalArgumentException[aggregations are not supported with 
search_type=scan]}]
at 
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:76)
at org.apache.spark.scheduler.Task.run(Task.scala:58)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


"aggregations are not supported with search_type=scan", which is fine.
The question is: how do I set search_type to the right value (e.g. count) in 
the sc.esRDD() call?
I tried several places in the q2 json with no success and I was not able to 
find an answer through
the documentation. I would appreciate any help.

However, I see a possible inconsistency with the behaviour of the ES API used 
directly via cURL.
The command with the same query above, and without any setting about 
search_type works correctly:

curl 'localhost:9200/logs/app/_search?pretty' -d'{"query" : { "term": { "appName": 
"console" } },
"aggregations": { "unusual": { "significant_terms": {"field": "pathname"} }}}'

returns hits:{} and aggregations:{}. Why the Spark integration does not work 
the same ?

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
<mailto:elasticsearch+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d044d380-a4b2-4d22-8990-60f318f7601a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/d044d380-a4b2-4d22-8990-60f318f7601a%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/551BC221.60400%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

How to get aggregations working in Elasticsearch Spark adapter ?

2015-04-01 Thread michele crudele


I have ES, Spark, and ES hadoop adapter installed on my laptop. I wrote a 
simple scala notebook to test ES adapter.
Everything was fine until I started thinking at more sophisticated 
features. This is the snippet that drives me crazy:

%AddJar 
file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar
%AddJar 
file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar

import org.elasticsearch.spark.rdd._

val q2 = """{
|"query" : { "term": { "appName": "console" } },
|"aggregations": {
|  "unusual": {
|"significant_terms": {"field": "pathname"}
|  }
|}
|}""".stripMargin

val res = sc.esRDD("logs/app", q2);

println("Matches: " + res.count())


When I run the code I get this exception:

Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Task 2 in stage 15.0 failed 1 times, 
most recent failure: Lost task 2.0 in stage 15.0 (TID 58, localhost): 
org.apache.spark.util.TaskCompletionListenerException: 
SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards 
failed; shardFailures {[N1R-UlgOQCGXCFCtbJ3sBQ][logrecords][2]: 
ElasticsearchIllegalArgumentException[aggregations are not supported with 
search_type=scan]}]
at 
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:76)
at org.apache.spark.scheduler.Task.run(Task.scala:58)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


"aggregations are not supported with search_type=scan", which is fine.
The question is: how do I set search_type to the right value (e.g. count) in 
the sc.esRDD() call? 
I tried several places in the q2 json with no success and I was not able to 
find an answer through
the documentation. I would appreciate any help.

However, I see a possible inconsistency with the behaviour of the ES API used 
directly via cURL.
The command with the same query above, and without any setting about 
search_type works correctly:

curl 'localhost:9200/logs/app/_search?pretty' -d'{"query" : { "term": { 
"appName": "console" } },
"aggregations": { "unusual": { "significant_terms": {"field": "pathname"} }}}'

returns hits:{} and aggregations:{}. Why the Spark integration does not work 
the same ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d044d380-a4b2-4d22-8990-60f318f7601a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Incorrect Aggregations returned from ES

2015-03-25 Thread MC

I am seeing some erroneous behavior in my ES cluster when performing
aggregations. Originally, I thought this was specific to a histogram as
that is where the error first appeared (in a K3 graph - see my post
https://groups.google.com/forum/#!topic/elasticsearch/iY-lKjtW7PM for
reference) but I have been able to re-create the exception with a simple
max aggregation. The details are as follows:

ES Version: 1.4.4
Topology: 5 nodes, 5 shards per index, 2 replicas
OS: Redhat Linux

To create the issue I execute the following query against the cluster:

{
"query": {
"term": {
"metric": "used"
}
},
"aggs": {
"max_val": {
"max": {
"field": "metric_value"
}
}
}
}

Upon executing this query multiple times, I get different responses. One
time I get the expected result:
...
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11712,
"max_score": 9.361205,
...
"aggregations": { "max_val": { "value": 18096380}}

whereas on another request with the same query I get the following bad
response:

"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11712,
"max_score": 9.361205,
...
"aggregations": { "max_val": { "value": 4697741490703565000}}

Some possibly relevant observations:
1. In my first set of tests, I was consistently getting the correct
results for the first 2 requests and the bad result on the 3rd request
(with no one else executing this query at that point in time)
2. Flushing the cache did not correct the issue
3. I reduced the number of replicas to 0 and was consistently getting the
same result (which happened to be the correct one)
4. After increasing the replica count back to 2 and waiting until ES
reported that the replication was complete, I tried the same experiment.
This time, the 1st request retrieved the correct result and the next 2
requests retrieved incorrect results. In this case the incorrect results
were not the same but were both huge and of the same order of magnitude.

Other info:
- The size of the index was about 3.3Gb with ~ 50M documents in it
- This is one of many date based indices (i.e. similar to the logstash
index setup), but the only one in this installation that exhibited the
issue. I believe we saw something similar in a UAT environment as well
where 1 or 2 of the indices acted in this weird manner
- ES reported the entire cluster as green

It seems that some shard(s)/replica(s) were being corrupted on the
replication and we were being routed to that one every 3rd hit. (Is this
somehow correlated to the number of replicas?)

So, my questions are:

1. Has anyone seen this type of behavior before?
2. Can it somehow be data dependent?
3. Is there any way to figure out what happened/what is happening?
4. Why does ES report the cluster state as green?
5. How can I debug this?
6. How can I prevent/correct this?

Any and all help/pointers would be greatly appreciated.

Thanks in advance,
MC

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2461a3f0-aee4-45f7-9210-3ef3524b12c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Get super set frequency on significant terms aggregations

2015-03-25 Thread Yuchen Zhao

I'm looking for a way to display the super set frequency and subset 
frequency along with the significant terms to provide more insights. From 
the significant terms aggregation output, I can use doc_count to calculate 
subset frequency. But is there a way to get super set frequency (the 
frequency that the term appears in the background set)? 

Internally the aggregation is using the superset freq and subset freq to 
derive the scores so these numbers are calculated by the significant terms 
aggregation. How can I get the frequencies in the response? Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c089430d-e05a-460f-b42e-f0f7de42260f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Grandparents aggregations NullPointer

2015-03-24 Thread Paolo Ciccarese

I've posted the issue here:
https://github.com/elastic/elasticsearch/issues/10158

On Wednesday, March 11, 2015 at 4:42:27 PM UTC-4, Paolo Ciccarese wrote:
>
> I am using Elasticsearch 1.4.4.
>
> I am defining a parent-child index with three levels (grandparents) 
> following the instructions in the document:
>
> https://www.elastic.co/guide/en/elasticsearch/guide/current/grandparents.html
>
> My structure is
> Continent -> Country -> Region
>
> MAPPING
> --
> I create an index with the following mapping:
>
> curl -XPOST 'localhost:9200/geo' -d'
> {
>   "mappings": {
> "continent": {},
> "country": {
>   "_parent": {
> "type": "continent" 
>   }
> },
> "region": {
>   "_parent": {
> "type": "country" 
>   }
> }
>   }
> }' 
>
> INDEXING
> --
> I index three entities:
>
> curl -XPOST 'localhost:9200/geo/continent/europe' -d'
> {
> "name":"Europe"
> }'
>
> curl -XPOST 'localhost:9200/geo/country/italy?parent=europe' -d'
> {
> "name":"Italy"
> }'
>
> curl -XPOST 
> 'localhost:9200/geo/region/lombardy?parent=italy&routing=europe' -d'
> {
> "name":"Lombardia"
> }'
>
> QUERY THAT WORKS
> 
> If I query and aggregate according to the document everything works fine:
>
> curl -XGET 'localhost:9200/geo/continent/_search?pretty=true' -d '
> {
> "query": {
> "has_child": {
> "type": "country",
> "query": {
> "has_child": {
> "type": "region",
> "query": {
> "match": {
> "name": "Lombardia"
> }
> }
> }
> }
> }
> },
> "aggs": {
> "country": {
> "terms": { 
> "field": "name"
> },
> "aggs": {
> "countries": {
> "children": {
> "type": "country"
> },
> "aggs": {
> "country_names" : {
> "terms" :  {
> "field" : "country.name"
> }
> }   
> }
> }
> }
> }
> }
> }'
>
> QUERY THAT DOES NOT WORK
> ---
> However, if I try with multi-level aggregations like in:
>
> curl -XGET 'localhost:9200/geo/continent/_search?pretty=true' -d '
> {
> "query": {
> "has_child": {
> "type": "country",
> "query": {
> "has_child": {
> "type": "region",
> "query": {
> "match": {
> "name": "Lombardia"
> }
> }
> }
> }
> }
> },
> "aggs": {
> "continent_names": {
> "terms": { 
> "field": "name"
> },
> "aggs": {
> "countries": {
> "children": {
> "type": "country"
> }, 
> "aggs": {
> "regions": {
> "children": {
> "type": "region"
> }, 
> "aggs": {
> "region_names" : {
> "terms" :  {
> "field" : "region.name"
> }
> }
> }
> }
> }
> }
> }
> }
> }
> }'
>
> I get back the following
>
> {
>
>   "error" : "SearchPhaseExecutionException[Failed to execute phase 
> [query], all shards failed; shardFailures 
> {[b5CbW5byQdSSW-rIwta0rA][geo][0]: QueryPhaseExecutionException[[geo][0]: 
> query[filtered(child_filter[country/continent](filtered(child_filter[region/country](filtered(name:lombardia)->cache(_type:region)))->cache(_type:country)))->cache(_type:continent)],from[0],size[10]:
>  
> Query Failed [Faile

Aggregations of phrases

2015-03-20 Thread Bruno Kamiche

I have a field that contains text from different sources (it could be a 
facebook post, a twitter tweet, a blog article, etc), so it varies in 
length.

I need to find common phrases in that field to determine conversation 
subjects.

terms aggregations works fine for words, but I need to find phrases, i 
guess setting the field to "not_indexed" is not the solution, as the whole 
text would be treated as a single item.

Is there anyway to accomplish this?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aa3230f7-2cc2-4244-900a-c1c913002ebb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations across multiple indices

2015-03-14 Thread Christian Rohling

Karl, thank you. That does solve the problem.

-Christian
On Mar 12, 2015 5:35 PM, "Karl Putland"  wrote:

> you might look at
> http://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation
>
> --K
>
> Karl Putland
> Senior Engineer
> *SimpleSignal*
> Anywhere: 303-242-8608
> <http://www.simplesignal.com/explainer_video.php>
>
>
> On Thu, Mar 12, 2015 at 10:04 AM, Christian Rohling 
> wrote:
>
>> Hello Everyone,
>> I am attempting to use aggregations to count the number of documents
>> matching a given query across multiple indices. What I would like to do, is
>> make those counts on distinct keys. Say I had following document in 2
>> different indices, aliased together.
>> ```
>> {
>> _index: myindex
>> _type: mytype
>> _id: 1
>> _version: 1
>> _score: 1
>> _source: {
>> country: MEXICO
>> }
>> }```
>>
>> When I make an aggs term query on the field "country" I would like it to
>> only return a single count for the document with id=1(which exists in both
>> indices). The actual use case is a bit more complicated than what's
>> described above, this is just an example of the functionality that I am
>> looking for. I cannot find any info in the docs, and have asked in the IRC
>> channel to no avail.
>>
>> -Christian Rohling
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALsYvrzV-PyUNUHcUHWNCDBQKz5jV9%3DTPoQ2hW1me8q%2BhBgKDg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CALsYvrzV-PyUNUHcUHWNCDBQKz5jV9%3DTPoQ2hW1me8q%2BhBgKDg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CA%2BEXWszW-B43Mc%2B6LZMxA5x2Hym5EPgNFQ%3DZ0a1da7s2yjEAyw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CA%2BEXWszW-B43Mc%2B6LZMxA5x2Hym5EPgNFQ%3DZ0a1da7s2yjEAyw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALsYvryw0a%2BdwqhZALgJRZAOPcyK%3DWTgvTErscZo0726oV4ybg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations across multiple indices

2015-03-12 Thread Karl Putland

you might look at
http://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation

--K

Karl Putland
Senior Engineer
*SimpleSignal*
Anywhere: 303-242-8608
<http://www.simplesignal.com/explainer_video.php>


On Thu, Mar 12, 2015 at 10:04 AM, Christian Rohling 
wrote:

> Hello Everyone,
> I am attempting to use aggregations to count the number of documents
> matching a given query across multiple indices. What I would like to do, is
> make those counts on distinct keys. Say I had following document in 2
> different indices, aliased together.
> ```
> {
> _index: myindex
> _type: mytype
> _id: 1
> _version: 1
> _score: 1
> _source: {
> country: MEXICO
> }
> }```
>
> When I make an aggs term query on the field "country" I would like it to
> only return a single count for the document with id=1(which exists in both
> indices). The actual use case is a bit more complicated than what's
> described above, this is just an example of the functionality that I am
> looking for. I cannot find any info in the docs, and have asked in the IRC
> channel to no avail.
>
> -Christian Rohling
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALsYvrzV-PyUNUHcUHWNCDBQKz5jV9%3DTPoQ2hW1me8q%2BhBgKDg%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CALsYvrzV-PyUNUHcUHWNCDBQKz5jV9%3DTPoQ2hW1me8q%2BhBgKDg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BEXWszW-B43Mc%2B6LZMxA5x2Hym5EPgNFQ%3DZ0a1da7s2yjEAyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Aggregations across multiple indices

2015-03-12 Thread Christian Rohling

Hello Everyone,
I am attempting to use aggregations to count the number of documents
matching a given query across multiple indices. What I would like to do, is
make those counts on distinct keys. Say I had following document in 2
different indices, aliased together.
```
{
_index: myindex
_type: mytype
_id: 1
_version: 1
_score: 1
_source: {
country: MEXICO
}
}```

When I make an aggs term query on the field "country" I would like it to
only return a single count for the document with id=1(which exists in both
indices). The actual use case is a bit more complicated than what's
described above, this is just an example of the functionality that I am
looking for. I cannot find any info in the docs, and have asked in the IRC
channel to no avail.

-Christian Rohling

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALsYvrzV-PyUNUHcUHWNCDBQKz5jV9%3DTPoQ2hW1me8q%2BhBgKDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Grandparents aggregations NullPointer

2015-03-11 Thread Paolo Ciccarese

I am using Elasticsearch 1.4.4.

I am defining a parent-child index with three levels (grandparents) 
following the instructions in the document:
https://www.elastic.co/guide/en/elasticsearch/guide/current/grandparents.html

My structure is
Continent -> Country -> Region

MAPPING
--
I create an index with the following mapping:

curl -XPOST 'localhost:9200/geo' -d'
{
  "mappings": {
"continent": {},
"country": {
  "_parent": {
"type": "continent" 
  }
},
"region": {
  "_parent": {
"type": "country" 
  }
}
  }
}' 

INDEXING
--
I index three entities:

curl -XPOST 'localhost:9200/geo/continent/europe' -d'
{
"name":"Europe"
}'

curl -XPOST 'localhost:9200/geo/country/italy?parent=europe' -d'
{
"name":"Italy"
}'

curl -XPOST 
'localhost:9200/geo/region/lombardy?parent=italy&routing=europe' -d'
{
"name":"Lombardia"
}'

QUERY THAT WORKS

If I query and aggregate according to the document everything works fine:

curl -XGET 'localhost:9200/geo/continent/_search?pretty=true' -d '
{
"query": {
"has_child": {
"type": "country",
"query": {
"has_child": {
"type": "region",
"query": {
"match": {
"name": "Lombardia"
}
}
}
}
}
},
"aggs": {
"country": {
"terms": { 
"field": "name"
},
"aggs": {
"countries": {
"children": {
"type": "country"
},
"aggs": {
"country_names" : {
"terms" :  {
"field" : "country.name"
}
}   
}
}
}
}
}
}'

QUERY THAT DOES NOT WORK
---
However, if I try with multi-level aggregations like in:

curl -XGET 'localhost:9200/geo/continent/_search?pretty=true' -d '
{
"query": {
"has_child": {
"type": "country",
"query": {
"has_child": {
"type": "region",
"query": {
"match": {
"name": "Lombardia"
}
}
}
}
}
},
"aggs": {
"continent_names": {
"terms": { 
"field": "name"
},
"aggs": {
"countries": {
"children": {
"type": "country"
}, 
"aggs": {
"regions": {
"children": {
"type": "region"
}, 
"aggs": {
"region_names" : {
"terms" :  {
"field" : "region.name"
}
}
}
}
}
}
}
}
}
}'

I get back the following

{

  "error" : "SearchPhaseExecutionException[Failed to execute phase [query], 
all shards failed; shardFailures {[b5CbW5byQdSSW-rIwta0rA][geo][0]: 
QueryPhaseExecutionException[[geo][0]: 
query[filtered(child_filter[country/continent](filtered(child_filter[region/country](filtered(name:lombardia)->cache(_type:region)))->cache(_type:country)))->cache(_type:continent)],from[0],size[10]:
 
Query Failed [Failed to execute main query]]; nested: NullPointerException; 
}{[b5CbW5byQdSSW-rIwta0rA][geo][1]: QueryPhaseExecutionException[[geo][1]: 
query[filtered(child_filter[country/continent](filtered(child_filter[region/country](filtered(name:lombardia)->cache(_type:region)))->cache(_type:country)))->cache(_type:continent)],from[0],size[10]:
 
Query Failed [Failed to execute main query]]; nested: NullPointerException; 
}{[b5CbW5byQdSSW-rIwta0rA][geo][2]: QueryPhaseExecutionException[[geo][2]: 
query[filtered(child_filter[country/continent](filtered(child_filter[region/country](filtered(name:lombardia)->cache(_type:region)))->cache(_type:country)))->cache(_type:continent)],from[0],size[10]:
 
Query Failed [Failed to execute main query]]; nested: NullPointerException; 
}{[b5CbW5byQdSSW-rIwta0rA][geo][3]: QueryPhaseExecutionException[[geo][3]: 
query[filtered(child_filter[country/continent](filtered(child_filter[regi

Re: Aggregations failing on fields with custom analyzer..

2015-03-10 Thread David Pilato

I'm going to make sure you get an answer soonish.

Best.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 10 mars 2015 à 03:18, Anil Karaka  a écrit :
> 
> Hello David,
> 
> Currently we are using Elasticsearch in production to support heavy 
> aggregation queries. It was working fine, but recently data nodes keep 
> leaving the cluster regularly.. Each node around 3-5 times a day. And the 
> scary part is cluster is in red state for few minutes each day. We tried 
> changing the Garbage collector, to G1GC to prevent stop-the-world garbage 
> collection state. But it still keeps happening. Nodes keep leaving and 
> rejoining the cluster.
> 
> We will need some help in fixing our current issues.. Even though nodes are 
> leaving and rejoining the cluster, we are only missing around 1000 documents 
> out of around 10 million documents per day during indexing.. Still we need to 
> address this issue.
> 
> And in addition to that we are going to face some scaling issues in near 
> future. We want some production support to validate our current cluster 
> setup, shard/replica settings, and indexing settings, not to mention cost 
> savings.
> 
> I sent the same message on the elasticsearch.org webpage, and am still 
> waiting for their response. How do we approach elasticsearch support?
> 
> Thank you.
> 
> 
>> On Thursday, February 19, 2015 at 2:34:14 PM UTC+5:30, David Pilato wrote:
>> If you can provide a full example working as I did, we can try it and see 
>> what is wrong.
>> 
>> -- 
>> David Pilato | Technical Advocate | Elasticsearch.com
>> @dadoonet | @elasticsearchfr | @scrutmydocs
>> 
>> 
>> 
>>> Le 19 févr. 2015 à 10:01, Anil Karaka  a écrit :
>>> 
>>> I"m getting this error as well using your PUT requests..
>>> 
>>> It feels like I'm doing something wrong.. But I don't know what exactly..
>>> 
>>> I'm using this index template.. 
>>> https://gist.github.com/syllogismos/c2dde4f097fea149e1a0
>>> 
>>> I didn't specify a particular mapping from my index but reindexed from a 
>>> previous index.. and ended up with that mapping and documents that looks 
>>> like above.. Am I seeing things and an obvious mistake? So lost right now..
>>> 
 On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:
 I think you are doing something wrong.
 
 DELETE index
 PUT index
 {
   "mappings": {
 "doc": {
   "properties": {
 "foo": {
   "type": "double"
 }
   }
 }
   }
 }
 PUT index/doc/1
 {
   "foo": "bar"
 }
 
 gives:
 
 {
"error": "MapperParsingException[failed to parse [foo]]; nested: 
 NumberFormatException[For input string: \"bar\"]; ",
"status": 400
 }
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet | @elasticsearchfr | @scrutmydocs
 
 
 
 Le 19 févr. 2015 à 09:39, Anil Karaka  a écrit :
 
 "_source" : {
 "Sort" : "",
 "gt" : "2015-02-18T15:07:10",
 "uid" : "54867dc55b482b04da7f23d8",
 "usId" : "54867dc55b482b04da7f23d7",
 "ut" : "2015-02-18T20:37:10",
 "act" : "productlisting",
 "st" : "2015-02-18T15:07:46",
 "Filter" : "",
 "av" : "3.0.0.0",
 "ViewType" : "SmallSingleList",
 "os" : "Windows",
 "categoryid" : "home-kitchen-curtains-blinds"
 }
 
 "properties" : {
 "uid" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 },
 "ViewType" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 },
 "usId" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 },
 "os" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 },
 "Sort" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 },
 "Filter" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 },
 "categoryid" : {
 "type" : "double"
 },
 "gt" : {
 "format" : "dateOptionalTime",
 "type" : "date"
 },
 "ut" : {
 "format" : "dateOptionalTime",
 "type" : "date"
 },
 "st" : {
 "format" : "dateOptionalTime",
 "type" : "date"
 },
 "act" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 },
 "av" : {
 "analyzer" : "case_insensitive_keyword_analyzer",
 "type" : "string"
 }
 }
 
 
 A sample document and the index mappings above..
 
 
 On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
 I don’t know without a concrete example.
 I’d say that if you map have a type number and you send "123" it could 
 work. 
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet | @elasticsearchfr | @scrutmydocs
 
 
 
 Le 19 fév

Re: Aggregations failing on fields with custom analyzer..

2015-03-10 Thread Anil Karaka

Hello David,

Currently we are using Elasticsearch in production to support heavy 
aggregation queries. It was working fine, but recently data nodes keep 
leaving the cluster regularly.. Each node around 3-5 times a day. And the 
scary part is cluster is in red state for few minutes each day. We tried 
changing the Garbage collector, to G1GC to prevent stop-the-world garbage 
collection state. But it still keeps happening. Nodes keep leaving and 
rejoining the cluster.

We will need some help in fixing our current issues.. Even though nodes are 
leaving and rejoining the cluster, we are only missing around 1000 
documents out of around 10 million documents per day during indexing.. 
Still we need to address this issue.

And in addition to that we are going to face some scaling issues in near 
future. We want some production support to validate our current cluster 
setup, shard/replica settings, and indexing settings, not to mention cost 
savings.

I sent the same message on the elasticsearch.org webpage, and am still 
waiting for their response. How do we approach elasticsearch support?

Thank you.


On Thursday, February 19, 2015 at 2:34:14 PM UTC+5:30, David Pilato wrote:
>
> If you can provide a full example working as I did, we can try it and see 
> what is wrong.
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>  
> Le 19 févr. 2015 à 10:01, Anil Karaka > 
> a écrit :
>
> I"m getting this error as well using your PUT requests..
>
> It feels like I'm doing something wrong.. But I don't know what exactly..
>
> I'm using this index template.. 
> https://gist.github.com/syllogismos/c2dde4f097fea149e1a0
>
> I didn't specify a particular mapping from my index but reindexed from a 
> previous index.. and ended up with that mapping and documents that looks 
> like above.. Am I seeing things and an obvious mistake? So lost right now..
>
> On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:
>>
>> I think you are doing something wrong.
>>
>> DELETE index
>> PUT index
>> {
>>   "mappings": {
>> "doc": {
>>   "properties": {
>> "foo": {
>>   "type": "double"
>> }
>>   }
>> }
>>   }
>> }
>> PUT index/doc/1
>> {
>>   "foo": "bar"
>> }
>>
>> gives:
>>
>> {
>>"error": "MapperParsingException[failed to parse [foo]]; nested: 
>> NumberFormatException[For input string: \"bar\"]; ",
>>"status": 400
>> }
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> *
>> @dadoonet  | @elasticsearchfr 
>>  | @scrutmydocs 
>> 
>>
>>
>>  
>> Le 19 févr. 2015 à 09:39, Anil Karaka  a écrit :
>>
>> "_source" : {
>> "Sort" : "",
>> "gt" : "2015-02-18T15:07:10",
>> "uid" : "54867dc55b482b04da7f23d8",
>> "usId" : "54867dc55b482b04da7f23d7",
>> "ut" : "2015-02-18T20:37:10",
>> "act" : "productlisting",
>> "st" : "2015-02-18T15:07:46",
>> "Filter" : "",
>> "av" : "3.0.0.0",
>> "ViewType" : "SmallSingleList",
>> "os" : "Windows",
>> "categoryid" : "home-kitchen-curtains-blinds"
>> }
>>
>> "properties" : {
>> "uid" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "ViewType" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "usId" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "os" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "Sort" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "Filter" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "categoryid" : {
>> "type" : "double"
>> },
>> "gt" : {
>> "format" : "dateOptionalTime",
>> "type" : "date"
>> },
>> "ut" : {
>> "format" : "dateOptionalTime",
>> "type" : "date"
>> },
>> "st" : {
>> "format" : "dateOptionalTime",
>> "type" : "date"
>> },
>> "act" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "av" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> }
>> }
>>
>>
>> A sample document and the index mappings above..
>>
>>
>> On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
>>
>> I don’t know without a concrete example.
>> I’d say that if you map have a type number and you send "123" it could 
>> work. 
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> *
>> @dadoonet  | @elasticsearchfr 
>>  | @scrutmydocs 
>> 
>>
>>
>>  
>> Le 19 févr. 2015 à 09:30, Anil Karaka  a écrit :
>>
>> It was my mistake, the fi

Re: Term aggregations and filtering values with the Java client

2015-03-05 Thread Colin Goodheart-Smithe

How about this:

AggregationBuilder aggregation = AggregationBuilders 

.terms(queryTerm) 
.field(queryField) 
.size(size) 
.order(Terms.Order.term(true)) 
.include(includeRegex) 
.exclude(excludeRegex);


Hope that helps,

Colin

On Wednesday, 4 March 2015 18:20:37 UTC, zhang...@gmail.com wrote:
>
> Hi all,
>
> According to the documentation online and I quote:
>
> "It is possible to filter the values for which buckets will be created. 
> This can be done using the include and exclude parameters which are based 
> on regular expressions."
>
> {
> "aggs" : {
> "tags" : {
> "terms" : {
> "field" : "tags",
> "include" : ".*sport.*",
> "exclude" : "water_.*"
> }
> }
> }
> }
>
>
> See: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
>
> I wonder how the include and exclude parameters can be used with the Java 
> client.
>
> Currently, I use the AggregationBuilder  class  to add a terms aggregation 
> to my query like this. 
>
> AggregationBuilder aggregation = AggregationBuilders
> .terms(queryTerm)
> .field(queryField)
> .size(size.or(1))
> .order(Terms.Order.term(true));
>
> However, I wonder how to use the "include" as described above using the 
> Java client.
> Any help is greatly appreciated! Thanks
>
> /JZ
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/67fa4d7a-e803-45cb-86af-63cb342700ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Term aggregations and filtering values with the Java client

2015-03-04 Thread zhangjunte

Hi all,

According to the documentation online and I quote:

"It is possible to filter the values for which buckets will be created. 
This can be done using the include and exclude parameters which are based 
on regular expressions."

{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"include" : ".*sport.*",
"exclude" : "water_.*"
}
    }
}
}


See: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

I wonder how the include and exclude parameters can be used with the Java 
client.

Currently, I use the AggregationBuilder  class  to add a terms aggregation 
to my query like this. 

AggregationBuilder aggregation = AggregationBuilders
.terms(queryTerm)
.field(queryField)
.size(size.or(1))
.order(Terms.Order.term(true));

However, I wonder how to use the "include" as described above using the 
Java client.
Any help is greatly appreciated! Thanks

/JZ

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0aca7958-98a5-48e2-9437-13f01a8a2854%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Weird aggregations results

2015-03-04 Thread benjamin . shopinvest

Hi,

We're trying to get statistics on our sells and are facing weird results 
when using aggregations.

Here's an extract of our mapping:

{
  "settings":{
"index":{
  "analysis":{
"analyzer":{
  "path-analyzer":{
"type":"custom",
"tokenizer":"path-tokenizer"
  }
},
"tokenizer":{
  "path-tokenizer":{
"type":"path_hierarchy"
  }
}
  }
}
  },
  "mappings":{
"order":{
  "properties":{
"product":{
  "type":"nested",
  "properties":{
"id":{
  "type":"integer"
},
"price":{
  "type":"float"
},
"category":{
  "type":"nested",
  "properties":{
"path":{
  "index_analyzer": "path-analyzer",
  "search_analyzer": "keyword",   "type":
"string"
}
  }
}
  }
}
  }
}
  }
}

This extract describe how our products and the categories they belong to 
are stored in an order.

What we're trying to do is to get the best selling products by category.

We're able to sort by category and aggregate products:

{
  "query":{
"bool":{
  "must":[
[
  {
"nested":{
  "path":"product",
  "filter":{
"nested":{
  "path":"product.category",
  "filter":{
"bool":{
  "must":[
{
  "prefix":{
"product.category.path":"livres/thriller"
  }
}
  ]
    }
      }
}
  }
}
  }
]
  ]
}
  },
  "aggs":{
"produit":{
  "nested":{
"path":"product"
  },
  "aggs":{
"product_id":{
  "terms":{
"field":"product.id <http://produit.id>",
"size":0
  },
  "aggs":{
"number":{
  "sum":{
"field":"product.price"
  }
}
  }
}
  }
}
  }
}

As shown in this result:

{
  "aggregations":{
"product":{
  "doc_count":400,
  "product_id":{
"buckets":[
  {
"key":430,
"doc_count":95,
"number":{
  "value":1450.0
}
  }
]
  }
}
  }
}

The problem is that when we change the category to "books", for example, we 
logically find the product 430 but with different numbers:

{
  "aggregations":{
"product":{
  "doc_count":800,
  "product_id":{
"buckets":[
  {
"key":430,
"doc_count":100,
"number":{
  "value":1500.0
}
  }
]
  }
}
  }
}

The obvious answer would be that in some orders the product 430 is in 
"books" but not in "books/thriller" but of course, we checked.

Having explained our situation, I only have one question: does anyone know 
why we got these results?

Subsidiary question: we have trouble understanding how ES works and 
especially how to debug our request, if you could explain us how you do it, 
that would really be greatly appreciated around here. :)

Thanks !

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/afe61727-8d2d-4f17-a787-b27459e0091e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bad performance on aggregations

2015-03-04 Thread Adrien Grand

What do the hot threads look like while the query is running?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html

On Tue, Feb 24, 2015 at 4:28 PM, Octavian 
wrote:

> Hello,
>
> I have a problem with the performance of aggregations: The time of the
> aggregation is very worst.
>
> I'm doing the next aggregation over an index with 160M documents (16G of
> data).
>
> {
>   "query": {
> "filtered": {
>   "filter": {
> "range": {
>   "_cache": false,
>   "insert_date": {
> "gte": 1424790449432
>   }
> }
>   }
> }
>   },
>   "aggs": {
> "tag": {
>   "terms": {
> "field": "origin_ip"
>   }
> }
>   }
> }
>
> Time: 18s. No results found (The result is correct. There are no documents
> with insert_date greater than 1424790449432)
>
> However if I'm doing the next search:
> {
>   "query": {
> "filtered": {
>   "filter": {
> "range": {
>   "_cache": false,
>   "insert_date": {
> "gte": 1424790449432
>   }
> }
>   }
> }
>   }
> }
>
> Time: 7ms  . No results found. (As I already wrote, the result is correct).
>
> What is happening?
>
> In documentation (
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_filtered_query.html),
> it is written :"The query (which happens to include a filter) returns a
> certain subset of documents, and the aggregation operates on those
> documents."
>
> In my situation, there are no elements in the subset of documents returned
> by the filter, so the aggregation should run in the same amount of time
> like the search.
>
> So, how can I improve the performance of that aggregation?
>
> Thank you,
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1d1d559a-7ebe-435f-be9c-5dd89528eb2d%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/1d1d559a-7ebe-435f-be9c-5dd89528eb2d%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4S6AwmgjY-T9u5EaH9RoL%2B-A3JHucAMYtpYyqFAHp50w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bad performance on aggregations

2015-03-04 Thread Sávio S . Teles de Oliveira

What is the ElasticSearch JAVA params (like heap size)? Try using range
date query
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
.

2015-03-04 8:20 GMT-03:00 Octavian :

>  Hello,
>
> Can anybody help me on this problem? Is this a known bug in Elasticsearch?
>
> Thank you
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e1bf4a18-77d0-4ba8-a3b1-4832494a6050%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Regards,

Sávio S. Teles de Oliveira

Co-Founder & Software Engineer at www.gogeo.io.
PHD student in Computer Science focusing on High Performance Maps Platform
and Spatial Algorithms.
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
https://twitter.com/savioteless

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFKmhPuC-UHQmpKboEDzLz0TFra8U%2Bzng5zgbzT_A2RoCEgDNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bad performance on aggregations

2015-03-04 Thread Octavian

 Hello,

Can anybody help me on this problem? Is this a known bug in Elasticsearch?

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e1bf4a18-77d0-4ba8-a3b1-4832494a6050%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is it possible to have a Kibana 4 "Data table" without aggregations?

2015-03-02 Thread Celio Nogueira de Faria Jr

Precisely! thanks!

Em segunda-feira, 2 de março de 2015 14:43:23 UTC-3, 
christian...@elasticsearch.com escreveu:
>
> Hi,
>
> When you create a dashboard you have the option to add searches as well as 
> visualisation. This will give you a discovery mode view on the data in the 
> dashboard.
>
> Is that what you are looking for?
>
> Best regards,
>
> Christian
>
> On Monday, March 2, 2015 at 5:27:41 PM UTC, Celio Nogueira de Faria Jr 
> wrote:
>>
>> Hi all,
>>
>> is there a way of replicating Kibana 3 "table" widget behavior within 
>> Kibana 4? I just need to show a filtered table and the new "Data table" 
>> widget is only for showing the results of composed aggregation(s).
>>
>> Thanks, Celio.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5bc807e5-5d20-4108-af3b-586639117dac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is it possible to have a Kibana 4 "Data table" without aggregations?

2015-03-02 Thread Celio Nogueira de Faria Jr

Hi again,

solution 
found: 
http://stackoverflow.com/questions/27851468/replicate-kibana-3-dashboard-using-kibana-4.

Thanks!

Em segunda-feira, 2 de março de 2015 14:27:41 UTC-3, Celio Nogueira de 
Faria Jr escreveu:
>
> Hi all,
>
> is there a way of replicating Kibana 3 "table" widget behavior within 
> Kibana 4? I just need to show a filtered table and the new "Data table" 
> widget is only for showing the results of composed aggregation(s).
>
> Thanks, Celio.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c8d1dac-57d1-4acb-a4ca-ce360a6a5593%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is it possible to have a Kibana 4 "Data table" without aggregations?

2015-03-02 Thread christian . dahlqvist

Hi,

When you create a dashboard you have the option to add searches as well as 
visualisation. This will give you a discovery mode view on the data in the 
dashboard.

Is that what you are looking for?

Best regards,

Christian

On Monday, March 2, 2015 at 5:27:41 PM UTC, Celio Nogueira de Faria Jr 
wrote:
>
> Hi all,
>
> is there a way of replicating Kibana 3 "table" widget behavior within 
> Kibana 4? I just need to show a filtered table and the new "Data table" 
> widget is only for showing the results of composed aggregation(s).
>
> Thanks, Celio.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/132ad6a5-9848-42e8-a68a-f2e7866b2dda%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is it possible to have a Kibana 4 "Data table" without aggregations?

2015-03-02 Thread Celio Nogueira de Faria Jr

Hi all,

is there a way of replicating Kibana 3 "table" widget behavior within 
Kibana 4? I just need to show a filtered table and the new "Data table" 
widget is only for showing the results of composed aggregation(s).

Thanks, Celio.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ffec549b-1d3c-4e33-ab37-ec9f46fd6ca0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Terms aggregations in docs with nested objects using a lot of memory

2015-03-02 Thread Roger de Cordova Farias

We are running ElasticSearch in a cluster with 1 node, 1 index, 6 shards,
55 million docs. We run queries with terms aggregation in 15 fields and it
works well, taking about 10 seconds to return.

We reindexed the docs in another cluster with 1 node, 1 index, 4 shards and
the same 55 million docs to run some tests. The mapping is a little
different, now having some nested objects. We run the same queries as
before (adapted to use the nested queries and aggregations) but we always
get circuit breaker error because loading the fields to the memory for the
aggregation would take more memory than available.

Both machines have the same configurations (64GB of memory, running ES
with ES_HEAP_SIZE=32g)

I used the node stats api to get some info about the fielddata
(_stats/fielddata?fields=my_field&pretty) in both machines about a field
that didn't have any change in the mapping, existing directly in the root
document (not nested), and I got a huge difference in memory usage:

*Machine 1:*

{
> "_shards" : {
> "total" : 8,
> "successful" : 4,
> "failed" : 0
> },
> "_all" : {
> "primaries" : {
> "fielddata" : {
> "memory_size_in_bytes" : 28132578552,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 224983649
> }
> }
> }
> },
> "total" : {
> "fielddata" : {
> "memory_size_in_bytes" : 28132578552,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 224983649
> }
> }
> }
> }
> },
> "indices" : {
> "my_index_1" : {
> "primaries" : {
> "fielddata" : {
> "memory_size_in_bytes" : 28132578552,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 224983649
> }
> }
> }
> },
> "total" : {
> "fielddata" : {
> "memory_size_in_bytes" : 28132578552,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 224983649
> }
> }
> }
> }
> }
> }
> }


*Machine 2:*

{
> "_shards" : {
> "total" : 12,
> "successful" : 6,
> "failed" : 0
> },
> "_all" : {
> "primaries" : {
> "fielddata" : {
> "memory_size_in_bytes" : 6812053739,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 62533082
> }
> }
> }
> },
> "total" : {
> "fielddata" : {
> "memory_size_in_bytes" : 6812053739,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 62533082
> }
> }
> }
> }
> },
> "indices" : {
> "my_index_2" : {
> "primaries" : {
> "fielddata" : {
> "memory_size_in_bytes" : 6812053739,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 62533082
> }
> }
> }
> },
> "total" : {
> "fielddata" : {
> "memory_size_in_bytes" : 6812053739,
> "evictions" : 0,
> "fields" : {
> "my_field" : {
> "memory_size_in_bytes" : 62533082
> }
> }
> }
> }
> }
> }
> }


While in the old index the field uses *62.5331MB*, in the new index it uses
*224.984MB*. Heavier fields that uses about 1GB in the old index are using
4~6GB in the new index. With the 15 aggregations together, the memory usage
increased to a size that won't fit in the heap.

Does the fact that the document have nested objects change the amount of
memory needed to keep non-nested fields in memory?

I tested using include_in_root in every nested object and doing all my
aggregation directly in the root doc (not using nested aggregations at all)
and still every field uses way more memory than the old index, with the
same data. Can someone explain it? I have no clue

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2532ue8bNrt3391xadCw9HH_gBCSPy5gPY3ds1hTDmnGL-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Handling vast (milions) number of results in Aggregations

2015-03-01 Thread Sharon Abu

Hello All,

I have a situation where I have an aggregation that counts something and 
groups by per device(mobile), so that means the aggregations results can 
get to 100's or 100,000's or event more of results and thus RAM might not 
be sufficient. I tried to look for a solution which I can tell ES to write 
the results to another index  instead of RAM, but could not find any way of 
doing so.
The concept I'm talking about is  similar to MongoDB's $out collection 
(http://docs.mongodb.org/manual/reference/operator/aggregation/out/) in 
map/reduce aggregations.

Appreciate any help on that important matter.

Thanks in advance
Sharon

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e83ec7cb-fe4f-441d-9b2d-7dff8a088b04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

scripted fields from the result of aggregations and the nested aggregations

2015-02-24 Thread cong yue

I have a aggregation as
---
GET /ats/_search
{
  "size": 0,
  "aggs": {
"accessTimes": {
  "date_histogram": {
"field": "accessTime",
"interval": "hour"
  },
  "aggs": {
"hit_docs": {
  "filter": {
"terms": {
  "cacheResult": ["tcp_mem_hit","tcp_hit","tcp_refresh_hit"]
}

---
And the response is 
-
{

   "aggregations": {
  "accessTimes": {
 "buckets": [
{
   "key_as_string": "2015-02-19T11:00:00.000Z",
   "key": 142434360,
   "doc_count": 14,
   "hit_docs": {
  "doc_count": 13
   }
},
{
   "key_as_string": "2015-02-19T12:00:00.000Z",
   "key": 142434720,
   "doc_count": 52,
   "hit_docs": {
  "doc_count": 41
   }
},
{
   "key_as_string": "2015-02-19T13:00:00.000Z",
   "key": 142435080,
   "doc_count": 231,
   "hit_docs": {
  "doc_count": 136
   }
},
{
   "key_as_string": "2015-02-19T14:00:00.000Z",
   "key": 142435440,
   "doc_count": 13161,
   "hit_docs": {
  "doc_count": 8957
   }
},
{
   "key_as_string": "2015-02-19T15:00:00.000Z",
   "key": 142435800,
   "doc_count": 8971,
   "hit_docs": {
  "doc_count": 5631
   }
...
-

I want to know whether there is some way to access the field of 
*aggregations.buckets.doc_count* and a*ggregations.buckets.hit_docs.doc_count 
*and calculate the hit ration from these two values. Finally, I want to 
view it from kiban4.

Thanks,
Cong


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/78241d3c-a51f-4b72-8708-5b2dc7583cbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bad performance on aggregations

2015-02-24 Thread Octavian

BTW, I'm running ES 1.4.2 with Java 7

On Tuesday, February 24, 2015 at 5:28:50 PM UTC+2, Octavian wrote:
>
> Hello, 
>
> I have a problem with the performance of aggregations: The time of the 
> aggregation is very worst.
>
> I'm doing the next aggregation over an index with 160M documents (16G of 
> data).
>
> {
>   "query": {
> "filtered": {
>   "filter": {
> "range": {
>   "_cache": false,
>   "insert_date": {
> "gte": 1424790449432
>   }
> }
>   }
> }
>   },
>   "aggs": {
> "tag": {
>   "terms": {
> "field": "origin_ip"
>   }
> }
>   }
> }
>
> Time: 18s. No results found (The result is correct. There are no documents 
> with insert_date greater than 1424790449432)
>
> However if I'm doing the next search:
> {
>   "query": {
> "filtered": {
>   "filter": {
> "range": {
>   "_cache": false,
>   "insert_date": {
> "gte": 1424790449432
>   }
> }
>   }
> }
>   }
> }
>
> Time: 7ms  . No results found. (As I already wrote, the result is correct).
>
> What is happening?
>
> In documentation (
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_filtered_query.html),
>  
> it is written :"The query (which happens to include a filter) returns a 
> certain subset of documents, and the aggregation operates on those 
> documents."
>
> In my situation, there are no elements in the subset of documents returned 
> by the filter, so the aggregation should run in the same amount of time 
> like the search.
>
> So, how can I improve the performance of that aggregation?
>
> Thank you,
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/62bfec72-b9b8-4c18-b78d-18bd6f211ab2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bad performance on aggregations

2015-02-24 Thread Octavian

Hello, 

I have a problem with the performance of aggregations: The time of the 
aggregation is very worst.

I'm doing the next aggregation over an index with 160M documents (16G of 
data).

{
  "query": {
"filtered": {
  "filter": {
"range": {
  "_cache": false,
  "insert_date": {
"gte": 1424790449432
  }
}
  }
}
  },
  "aggs": {
"tag": {
  "terms": {
"field": "origin_ip"
  }
}
  }
}

Time: 18s. No results found (The result is correct. There are no documents 
with insert_date greater than 1424790449432)

However if I'm doing the next search:
{
  "query": {
"filtered": {
  "filter": {
"range": {
  "_cache": false,
  "insert_date": {
"gte": 1424790449432
  }
}
  }
}
  }
}

Time: 7ms  . No results found. (As I already wrote, the result is correct).

What is happening?

In documentation 
(http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_filtered_query.html),
 
it is written :"The query (which happens to include a filter) returns a 
certain subset of documents, and the aggregation operates on those 
documents."

In my situation, there are no elements in the subset of documents returned 
by the filter, so the aggregation should run in the same amount of time 
like the search.

So, how can I improve the performance of that aggregation?

Thank you,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1d1d559a-7ebe-435f-be9c-5dd89528eb2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations on multiple children cause wrong doc_count

2015-02-23 Thread matthias



Here're the mappings:

{
  "discussion" : {
"mappings" : {
  "groupMembership" : {
"_parent" : {
  "type" : "group"
},
"_routing" : {
  "required" : true
},
"properties" : {
  "approved" : {
"type" : "boolean",
"store" : true
  },
  "id" : {
"type" : "string",
"index" : "not_analyzed"
  },
  "parent" : {
"type" : "string",
"index" : "not_analyzed",
"store" : true
  },
  "userId" : {
"type" : "string",
"index" : "not_analyzed"
  }
}
  },
  "group" : {
"properties" : {
  "createdDate" : {
"type" : "date",
"format" : "dateOptionalTime"
  },
  "deletedDate" : {
"type" : "date",
"format" : "dateOptionalTime"
  },
  "description" : {
"type" : "string"
  },
  "id" : {
"type" : "string",
"index" : "not_analyzed"
  },
  "modifiedDate" : {
"type" : "date",
"format" : "dateOptionalTime"
  },
  "name" : {
"type" : "string",
"store" : true
  },
  "targetType" : {
"type" : "string"
  },
  "tenantId" : {
"type" : "string",
"index" : "not_analyzed"
  },
  "visibility" : {
"type" : "string",
"index" : "not_analyzed"
  }
}
  },
  "discussion" : {
"_parent" : {
  "type" : "group"
},
"_routing" : {
  "required" : true
},
"properties" : {
  "authorId" : {
"type" : "string",
"index" : "not_analyzed"
  },
  "createdDate" : {
"type" : "date",
"format" : "date_optional_time"
  },
  "deletedDate" : {
"type" : "date",
"format" : "date_optional_time"
  },
  "id" : {
"type" : "string",
"index" : "not_analyzed"
  },
  "message" : {
"type" : "string"
  },
  "modifiedDate" : {
"type" : "date",
"format" : "date_optional_time"
  },
  "parent" : {
"type" : "string",
"index" : "not_analyzed"
  },
  "subject" : {
"type" : "string",
"fields" : {
  "subject.raw" : {
"type" : "string",
"index" : "not_analyzed"
  }
}
  },
  "tenantId" : {
"type" : "string",
"index" : "not_analyzed"
  }
}
  }
}
  }
}

And some sample objects:
{
  "_index" : "discussion",
  "_type" : "group",
  "_id" : "7cc810d9-1412-48d0-97bd-830543230624",
  "_score" : 1.0,
  "_source":{"id":"7cc810d9-1412-48d0-97bd-830543230624","name":"CHAPTER 
19. The Prophet.","description":"\"Shipmates, have ye shipped in that 
ship?\"","visibility":"PUBLIC","targetType":null,"createdDate":1424460745000,"modifiedDate":1424460745000,"deletedDate":null,"tenantId":"18b37386-dd53-496a-82b4-26ef56d4fdf9"}
}, {
  "_index" : "discussion",
  "_type" : "groupMembership",
  "_id" : "f8beda9a-87cd-4783-b369-645aa1fecc8a",
  "_score" : 1.0,
  
"_source":{"id":"f8beda9a-87cd-4783-b369-645aa1fecc8a","userId":"54570d06-44ae-4241-9118-f74ea5345703","parent":"7cc810d9-1412-48d0-97bd-830543230624","approved":true}
}, {
  "_index" : "discussion",
  "_type" : "groupMembership",
  "_id" : "98894b52-c764-4b4b-b737-dc9393a5a6b1",
  "_score" : 1.0,
  
"_source":{"id":"98894b52-c764-4b4b-b737-dc9393a5a6b1","userId":"5c04f5bb-8d75-47bb-87a0-095ea3e05818","parent":"7cc810d9-1412-48d0-97bd-830543230624","approved":true}
}, {
  "_index" : "discussion",
  "_type" : "groupMembership",
  "_id" : "f51b93d8-55d9-4a3c-aa4a-e7c2abcd3bbe",
  "_score" : 1.0,
  
"_source":{"id":"f51b93d8-55d9-4a3c-aa4a-e7c2abcd3bbe","userId":"11cdfd7c-59a4-49a8-b3d9-62c4b49bf761","parent":"7cc810d9-1412-48d0-97bd-830543230624","approved":true}
}, {
  "_index" : "discussion",
  "_type" : "groupMembership",
  "_id" : "145b9ae9-daf4-4de7-8245-74453565744b",
  "_score" : 1.0,
  
"_source":{"id":"145b9ae9-daf4-4de7-8245-74453565744b","userId":"c5c103a3-c7c5-41ec-b1c8-0e4aa6861344","parent":"7cc810d9-1412-48d0-97bd-830543230624","approved":true}
}, {
  "_index" : "discussion",
  "_type" : "discussion",
  "_id" : "af2d431d-b8bc-49cc-87cd-6bbe724ad4b8",
  "_score" : 1.0,
  
"_source":{"id":"af2d431d-b8bc-49cc-87cd-6bbe724ad4b8","authorId":"11cdfd7c-59a4-49a8-b3d9-62c4b49bf761","parent":"7cc810d9-1412-48d0-97bd-830543230624","subject":"\

Re: Aggregations on multiple children cause wrong doc_count

2015-02-21 Thread Mark Harwood

Hard to diagnose the query behaviour without the docs + mappings.
Can you post examples?

On Friday, February 20, 2015 at 10:26:19 PM UTC, matt...@netbrains.com 
wrote:
>
> I have an index which consists of Groups, Members and Discussions. Both 
> Discussions and Members are children of Groups.
> When doing an aggregation query to find out how many Members and 
> Discussions there're for a specific Group(s) the aggregation which is 2nd 
> ends up having a duplicated doc_count.
>
> My query looks like this, membershipCount will end up being double of the 
> correct value:
>
> {
>   "query" : {
> "ids" : {
>   "type" : "group",
>   "values" : [ "group_id" ]
> }
>   },
>   "aggregations" : {
> "groups" : {
>   "terms" : {
> "field" : "id"
>   },
>   "aggregations" : {
> "discussionCount" : {
>   "children" : {
> "type" : "discussion"
>   }
> },
> "membershipCount" : {
>   "children" : {
> "type" : "groupMembership"
>   }
> }
>   }
> }
>   }
> }
>
> If I simply switch membershipCount and discussionCount, making 
> discussionCount the 2nd subaggregation the doc count for memberships is 
> correct but discussionCount ends up doubled.
>
> I'm able to get rid of this behavior by running a cardinality 
> subaggregation like this:
>
> {
>   "query" : {
> "ids" : {
>   "type" : "group",
>   "values" : [ "group_id" ]
> }
>   },
>   "aggregations" : {
> "groups" : {
>   "terms" : {
> "field" : "id"
>   },
>   "aggregations" : {
> "discussionCount" : {
>   "children" : {
> "type" : "discussion"
>   },
>   "aggregations" : {
> "activeDiscussionAggs" : {
>   "cardinality" : {
> "field" : "id"
>   }
> }
>   }
> },
> "membershipCount" : {
>   "children" : {
> "type" : "groupMembership"
>   },
>   "aggregations" : {
> "groupMembers" : {
>   "cardinality" : {
> "field" : "id"
>   }
> }
>   }
> }
>   }
> }
>   }
> }
>
> Adding "collect_mode" : "breadth_first" to the terms query even causes the 
> first doc_count to end up being 3times the correct count and the 2nd 
> aggregation count 6times the correct count.
> Using cardinality everything works fine but I'd greatly appreciate if 
> anyone could explain this behavior or if my aggregation should be done a 
> different way.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2894615c-752b-41f0-9d61-e48ef2a10322%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aggregations on multiple children cause wrong doc_count

2015-02-20 Thread matthias

I have an index which consists of Groups, Members and Discussions. Both 
Discussions and Members are children of Groups.
When doing an aggregation query to find out how many Members and 
Discussions there're for a specific Group(s) the aggregation which is 2nd 
ends up having a duplicated doc_count.

My query looks like this, membershipCount will end up being double of the 
correct value:

{
  "query" : {
"ids" : {
  "type" : "group",
  "values" : [ "group_id" ]
}
  },
  "aggregations" : {
"groups" : {
  "terms" : {
"field" : "id"
  },
  "aggregations" : {
"discussionCount" : {
  "children" : {
"type" : "discussion"
  }
},
"membershipCount" : {
  "children" : {
"type" : "groupMembership"
  }
}
  }
}
  }
}

If I simply switch membershipCount and discussionCount, making 
discussionCount the 2nd subaggregation the doc count for memberships is 
correct but discussionCount ends up doubled.

I'm able to get rid of this behavior by running a cardinality 
subaggregation like this:

{
  "query" : {
"ids" : {
  "type" : "group",
  "values" : [ "group_id" ]
}
  },
  "aggregations" : {
"groups" : {
  "terms" : {
"field" : "id"
  },
  "aggregations" : {
"discussionCount" : {
  "children" : {
    "type" : "discussion"
  },
  "aggregations" : {
"activeDiscussionAggs" : {
  "cardinality" : {
"field" : "id"
  }
}
  }
},
"membershipCount" : {
  "children" : {
"type" : "groupMembership"
  },
  "aggregations" : {
"groupMembers" : {
  "cardinality" : {
"field" : "id"
  }
}
  }
}
  }
}
  }
}

Adding "collect_mode" : "breadth_first" to the terms query even causes the 
first doc_count to end up being 3times the correct count and the 2nd 
aggregation count 6times the correct count.
Using cardinality everything works fine but I'd greatly appreciate if 
anyone could explain this behavior or if my aggregation should be done a 
different way.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/759c356f-94f2-43eb-b3f0-6a6d3a1089bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Script merge the results of two aggregations

2015-02-19 Thread Masaru Hasegawa

Hi,

Looks like you are using lucene expression [1]. See the link for the limitation 
of lucene expression. Today it only supports numeric values.
Since terms agg doesn’t have lang property, probably you have 
“script.default_lang" set to “expression" in elasticsearch.yml?

FYI, if you put “lang”:”groovy” (and if configuration allows running dynamic 
groovy script), your query should work.
But make sure you read release note [2] before turning on dynamic groovy 
scripting. (you can use groovy script without turning on dynamic scripting [3])


Masaru

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts
[2] http://www.elasticsearch.org/blog/elasticsearch-1-4-3-and-1-3-8-released/
[3] 
http://www.elasticsearch.org/blog/running-groovy-scripts-without-dynamic-scripting/


On February 19, 2015 at 22:21:41, ali balci (balci.a...@gmail.com) wrote:
> I second error :
>  
> {
> "error": "SearchPhaseExecutionException[Failed to execute phase
> [query_fetch], all shards failed; shardFailures {[g][test][0]:
> RemoteTransportException[[Mammomax][inet[/192.168.1.8:9300]][search/phase/query+fetch]];
>   
> nested: SearchParseException[[test][0]:
> query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to
> parse source 
> [{\"query\":{\"match_all\":{}},\"aggs\":{\"Brand\":{\"terms\":{\"script\":\"doc['brandName'].value+'|'+doc['brandLink'].value\",\"size\":0]]];
>   
> nested: ExpressionScriptCompilationException[Failed to parse
> expression: doc['brandName'].value+'|'+doc['brandLink'].value];
> nested: ParseException[ unexpected character ''' at position (23).];
> nested: NoViableAltException; }]",
> "status": 400
> }
>  
>  
> 2015-02-19 14:49 GMT+02:00 ali balci :
>  
> >
> > I use aggregrations on elasticsearch version 1.3.8. I use aggregation 
> > script for awhile  
> today ıt didnt work. Please help ı cant find any solution :
> >
> > This the mapping:
> >
> > "mappings": {
> > "product": {
> > "properties": {
> > "brandId": {
> > "type": "integer"
> > },
> > "brandIsActive": {
> > "type": "boolean"
> > },
> > "brandLink": {
> > "type": "string",
> > "index": "not_analyzed"
> > },
> > "brandName": {
> > "type": "string",
> > "index": "not_analyzed"
> > }
> >
> > }
> >
> > }
> >
> > }
> >
> >
> > this my query:
> >
> >
> > post alias-test/product/_search
> > {
> > "query": {
> > "match_all": {}
> > },
> > "aggs": {
> > "Brand": {
> > "terms": {
> > "script": "doc['brandName'].value",
> > "size": 0
> > }
> > }
> > }
> > }
> >
> > This is the error:
> >
> > {
> > "error": "SearchPhaseExecutionException[Failed to execute phase 
> > [query_fetch],  
> all shards failed; shardFailures {[g][mizu-20150219142655][0]: 
> RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]];
>   
> nested: SearchParseException[[mizu-20150219142655][0]: 
> query[ConstantScore(*:*)],from[-1],size[-1]:  
> Parse Failure [Failed to parse source 
> [{\"query\":{\"match_all\":{}},\"aggs\":{\"Brand\":{\"terms\":{\"script\":\"doc['brandName'].value\"]]];
>   
> nested: ExpressionScriptCompilationException[Field [brandName] used in 
> expression  
> must be numeric]; }]",
> > "status": 400
> > }
> >
> >
> > The other query :
> >
> > post test/product/_search
> > {
> > "query": {
> > "match_all": {}
> > },
> > "aggs": {
> > "Brand": {
> > "terms": {
> > "script": "doc['brandName'].value+'|'+doc['brandLink'].value",
> > "size": 0
> > }
> > }
> > }
> > }
> >
> >
> > the error :
> >
> > post test/product/_search
> > {
> > "query": {
> > "match_all": {}
> > },
> > "aggs": {
> > "Brand": {
> > "terms": {
> > "script": "doc['brandName'].value+'|'+doc['brandLink'].value",
> > "size": 0
> > }
> > }
> > }
> > }
> >
> > --
> > You received this message because you are subscribed to a topic in the
> > Google Groups "elasticsearch" group.
> > To unsubscribe from this topic, visit
> > https://groups.google.com/d/topic/elasticsearch/aeNWfgYNVmA/unsubscribe.  
> > To unsubscribe from this group and all its topics, send an email to
> > elasticsearch+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com
> >   
> >  
> > .
> > For more options, visit https://groups.google.com/d/optout.
> >
>  
>  
>  
> --
> Best Regards
>  
> ALİ BALCI
> Bilgisayar Mühendisligi
> Tel:0543 699 59 88
> FACEBOOK  
> BLOG  
>  
> --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch"  
> group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.  
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAMLQj%3D%2B1k0DiPwXdtxYNMjcK%2BMeQmzSAdw-9D0nrVpP-mPU4nw%40mail.gmail.com.
>   
> For more options, visit https://groups.google.com/d/optout.
>  

-- 
You received this message

Re: Elasticsearch Script merge the results of two aggregations

2015-02-19 Thread ali balci

I second error :

{
   "error": "SearchPhaseExecutionException[Failed to execute phase
[query_fetch], all shards failed; shardFailures {[g][test][0]:
RemoteTransportException[[Mammomax][inet[/192.168.1.8:9300]][search/phase/query+fetch]];
nested: SearchParseException[[test][0]:
query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to
parse source 
[{\"query\":{\"match_all\":{}},\"aggs\":{\"Brand\":{\"terms\":{\"script\":\"doc['brandName'].value+'|'+doc['brandLink'].value\",\"size\":0]]];
nested: ExpressionScriptCompilationException[Failed to parse
expression: doc['brandName'].value+'|'+doc['brandLink'].value];
nested: ParseException[ unexpected character ''' at position (23).];
nested: NoViableAltException; }]",
   "status": 400
}


2015-02-19 14:49 GMT+02:00 ali balci :

>
> I use aggregrations on elasticsearch version 1.3.8. I use aggregation script 
> for awhile today ıt didnt work. Please help ı cant find any solution :
>
> This the mapping:
>
>   "mappings": {
>  "product": {
> "properties": {
>"brandId": {
>   "type": "integer"
>},
>"brandIsActive": {
>   "type": "boolean"
>},
>"brandLink": {
>   "type": "string",
>   "index": "not_analyzed"
>},
>"brandName": {
>   "type": "string",
>   "index": "not_analyzed"
>}
>
>   }
>
>}
>
> }
>
>
> this my query:
>
>
> post alias-test/product/_search
> {
> "query": {
> "match_all": {}
> },
> "aggs": {
>"Brand": {
>  "terms": {
>  "script": "doc['brandName'].value",
>  "size": 0
>   }
> }
>  }
> }
>
> This is the error:
>
> {
>"error": "SearchPhaseExecutionException[Failed to execute phase 
> [query_fetch], all shards failed; shardFailures {[g][mizu-20150219142655][0]: 
> RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]];
>  nested: SearchParseException[[mizu-20150219142655][0]: 
> query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse 
> source 
> [{\"query\":{\"match_all\":{}},\"aggs\":{\"Brand\":{\"terms\":{\"script\":\"doc['brandName'].value\"]]];
>  nested: ExpressionScriptCompilationException[Field [brandName] used in 
> expression must be numeric]; }]",
>"status": 400
> }
>
>
> The other query :
>
> post test/product/_search
> {
> "query": {
> "match_all": {}
> },
> "aggs": {
>"Brand": {
>  "terms": {
>  "script": "doc['brandName'].value+'|'+doc['brandLink'].value",
>  "size": 0
>   }
> }
>  }
> }
>
>
> the error :
>
> post test/product/_search
> {
> "query": {
> "match_all": {}
> },
> "aggs": {
>"Brand": {
>  "terms": {
>  "script": "doc['brandName'].value+'|'+doc['brandLink'].value",
>  "size": 0
>   }
> }
>  }
> }
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/aeNWfgYNVmA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Best Regards

ALİ BALCI
Bilgisayar Mühendisligi
Tel:0543 699 59 88
FACEBOOK 
BLOG 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMLQj%3D%2B1k0DiPwXdtxYNMjcK%2BMeQmzSAdw-9D0nrVpP-mPU4nw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch Script merge the results of two aggregations

2015-02-19 Thread ali balci



I use aggregrations on elasticsearch version 1.3.8. I use aggregation script 
for awhile today ıt didnt work. Please help ı cant find any solution : 

This the mapping:

  "mappings": {
 "product": {
"properties": {
   "brandId": {
  "type": "integer"
   },
   "brandIsActive": {
  "type": "boolean"
   },
   "brandLink": {
  "type": "string",
  "index": "not_analyzed"
   },
   "brandName": {
  "type": "string",
  "index": "not_analyzed"
   }

  }

   }

}


this my query:


post alias-test/product/_search
{
"query": {
"match_all": {}
},
"aggs": {
   "Brand": {
 "terms": {
 "script": "doc['brandName'].value",
 "size": 0
  } 
} 
 }  
}

This is the error:

{
   "error": "SearchPhaseExecutionException[Failed to execute phase 
[query_fetch], all shards failed; shardFailures {[g][mizu-20150219142655][0]: 
RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]];
 nested: SearchParseException[[mizu-20150219142655][0]: 
query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse 
source 
[{\"query\":{\"match_all\":{}},\"aggs\":{\"Brand\":{\"terms\":{\"script\":\"doc['brandName'].value\"]]];
 nested: ExpressionScriptCompilationException[Field [brandName] used in 
expression must be numeric]; }]",
   "status": 400
}


The other query :

post test/product/_search
{
"query": {
"match_all": {}
},
"aggs": {
   "Brand": {
 "terms": {
 "script": "doc['brandName'].value+'|'+doc['brandLink'].value",
 "size": 0
  } 
} 
 }  
}


the error :

post test/product/_search
{
"query": {
"match_all": {}
},
"aggs": {
   "Brand": {
 "terms": {
 "script": "doc['brandName'].value+'|'+doc['brandLink'].value",
 "size": 0
  } 
} 
 }  
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread Anil Karaka

I understand what you are saying.. I was able to recreate the same error 
you showed myself..

I was not able to insert into your index whose mapping is "double", but I 
am able to insert a string into my older index whose mapping is "double".. 
Very weird..
But I don't know how you could recreate my case..

I'm using this index 
template, https://gist.github.com/syllogismos/c2dde4f097fea149e1a0 and then 
reindexed from an older index.. and it took the mapping as double, and has 
strings in the indexed documents later..

Thanks for your help..

On Thursday, February 19, 2015 at 2:34:14 PM UTC+5:30, David Pilato wrote:
>
> If you can provide a full example working as I did, we can try it and see 
> what is wrong.
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>  
> Le 19 févr. 2015 à 10:01, Anil Karaka > 
> a écrit :
>
> I"m getting this error as well using your PUT requests..
>
> It feels like I'm doing something wrong.. But I don't know what exactly..
>
> I'm using this index template.. 
> https://gist.github.com/syllogismos/c2dde4f097fea149e1a0
>
> I didn't specify a particular mapping from my index but reindexed from a 
> previous index.. and ended up with that mapping and documents that looks 
> like above.. Am I seeing things and an obvious mistake? So lost right now..
>
> On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:
>>
>> I think you are doing something wrong.
>>
>> DELETE index
>> PUT index
>> {
>>   "mappings": {
>> "doc": {
>>   "properties": {
>> "foo": {
>>   "type": "double"
>> }
>>   }
>> }
>>   }
>> }
>> PUT index/doc/1
>> {
>>   "foo": "bar"
>> }
>>
>> gives:
>>
>> {
>>"error": "MapperParsingException[failed to parse [foo]]; nested: 
>> NumberFormatException[For input string: \"bar\"]; ",
>>"status": 400
>> }
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> *
>> @dadoonet  | @elasticsearchfr 
>>  | @scrutmydocs 
>> 
>>
>>
>>  
>> Le 19 févr. 2015 à 09:39, Anil Karaka  a écrit :
>>
>> "_source" : {
>> "Sort" : "",
>> "gt" : "2015-02-18T15:07:10",
>> "uid" : "54867dc55b482b04da7f23d8",
>> "usId" : "54867dc55b482b04da7f23d7",
>> "ut" : "2015-02-18T20:37:10",
>> "act" : "productlisting",
>> "st" : "2015-02-18T15:07:46",
>> "Filter" : "",
>> "av" : "3.0.0.0",
>> "ViewType" : "SmallSingleList",
>> "os" : "Windows",
>> "categoryid" : "home-kitchen-curtains-blinds"
>> }
>>
>> "properties" : {
>> "uid" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "ViewType" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "usId" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "os" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "Sort" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "Filter" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "categoryid" : {
>> "type" : "double"
>> },
>> "gt" : {
>> "format" : "dateOptionalTime",
>> "type" : "date"
>> },
>> "ut" : {
>> "format" : "dateOptionalTime",
>> "type" : "date"
>> },
>> "st" : {
>> "format" : "dateOptionalTime",
>> "type" : "date"
>> },
>> "act" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> },
>> "av" : {
>> "analyzer" : "case_insensitive_keyword_analyzer",
>> "type" : "string"
>> }
>> }
>>
>>
>> A sample document and the index mappings above..
>>
>>
>> On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
>>
>> I don’t know without a concrete example.
>> I’d say that if you map have a type number and you send "123" it could 
>> work. 
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> *
>> @dadoonet  | @elasticsearchfr 
>>  | @scrutmydocs 
>> 
>>
>>
>>  
>> Le 19 févr. 2015 à 09:30, Anil Karaka  a écrit :
>>
>> It was my mistake, the field I was trying to do an aggregation was mapped 
>> double, I assumed its a string, after seeing some sample documents with 
>> strings..
>>
>> Why didn't es throw an error when I'm indexing docs with strings instead 
>> of double..?
>>
>> On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
>>
>> Did you apply your analyzer to your mapping?
>>
>> David
>>
>> Le 19 févr. 2015 à 08:53, Anil Karaka  a écrit :
>>
>>
>> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread David Pilato

If you can provide a full example working as I did, we can try it and see what 
is wrong.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 19 févr. 2015 à 10:01, Anil Karaka  a écrit :
> 
> I"m getting this error as well using your PUT requests..
> 
> It feels like I'm doing something wrong.. But I don't know what exactly..
> 
> I'm using this index template.. 
> https://gist.github.com/syllogismos/c2dde4f097fea149e1a0
> 
> I didn't specify a particular mapping from my index but reindexed from a 
> previous index.. and ended up with that mapping and documents that looks like 
> above.. Am I seeing things and an obvious mistake? So lost right now..
> 
> On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:
> I think you are doing something wrong.
> 
> DELETE index
> PUT index
> {
>   "mappings": {
> "doc": {
>   "properties": {
> "foo": {
>   "type": "double"
> }
>   }
> }
>   }
> }
> PUT index/doc/1
> {
>   "foo": "bar"
> }
> 
> gives:
> 
> {
>"error": "MapperParsingException[failed to parse [foo]]; nested: 
> NumberFormatException[For input string: \"bar\"]; ",
>"status": 400
> }
> 
> -- 
> David Pilato | Technical Advocate | Elasticsearch.com 
> 
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
> 
> 
> 
> Le 19 févr. 2015 à 09:39, Anil Karaka > a 
> écrit :
> 
> "_source" : {
> "Sort" : "",
> "gt" : "2015-02-18T15:07:10",
> "uid" : "54867dc55b482b04da7f23d8",
> "usId" : "54867dc55b482b04da7f23d7",
> "ut" : "2015-02-18T20:37:10",
> "act" : "productlisting",
> "st" : "2015-02-18T15:07:46",
> "Filter" : "",
> "av" : "3.0.0.0",
> "ViewType" : "SmallSingleList",
> "os" : "Windows",
> "categoryid" : "home-kitchen-curtains-blinds"
> }
> 
> "properties" : {
> "uid" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "ViewType" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "usId" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "os" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "Sort" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "Filter" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "categoryid" : {
> "type" : "double"
> },
> "gt" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "ut" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "st" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "act" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "av" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> }
> }
> 
> 
> A sample document and the index mappings above..
> 
> 
> On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
> I don’t know without a concrete example.
> I’d say that if you map have a type number and you send "123" it could work. 
> 
> -- 
> David Pilato | Technical Advocate | Elasticsearch.com 
> 
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
> 
> 
> 
> Le 19 févr. 2015 à 09:30, Anil Karaka > a écrit :
> 
> It was my mistake, the field I was trying to do an aggregation was mapped 
> double, I assumed its a string, after seeing some sample documents with 
> strings..
> 
> Why didn't es throw an error when I'm indexing docs with strings instead of 
> double..?
> 
> On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
> Did you apply your analyzer to your mapping?
> 
> David
> 
> Le 19 févr. 2015 à 08:53, Anil Karaka > a écrit :
> 
> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
>  
> 
> 
> Posted in stack over flow as well..
> 
> On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
> that fields are case insensitive..
> 
> I have my analyzer as below, 
> 
> "index": {
> "analysis": {
> "analyzer": { // Custom Analyzer with keyword tokenizer and 
> lowercase filter, same as not_analyzed but case insensitive
> "case_insensitive_keyword_analyzer": {
> "tokenizer": "keyword",
> "filter": "lowercase"
> }
> }
> }
>

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread Anil Karaka

I"m getting this error as well using your PUT requests..

It feels like I'm doing something wrong.. But I don't know what exactly..

I'm using this index template.. 
https://gist.github.com/syllogismos/c2dde4f097fea149e1a0

I didn't specify a particular mapping from my index but reindexed from a 
previous index.. and ended up with that mapping and documents that looks 
like above.. Am I seeing things and an obvious mistake? So lost right now..

On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:
>
> I think you are doing something wrong.
>
> DELETE index
> PUT index
> {
>   "mappings": {
> "doc": {
>   "properties": {
> "foo": {
>   "type": "double"
> }
>   }
> }
>   }
> }
> PUT index/doc/1
> {
>   "foo": "bar"
> }
>
> gives:
>
> {
>"error": "MapperParsingException[failed to parse [foo]]; nested: 
> NumberFormatException[For input string: \"bar\"]; ",
>"status": 400
> }
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>  
> Le 19 févr. 2015 à 09:39, Anil Karaka > 
> a écrit :
>
> "_source" : {
> "Sort" : "",
> "gt" : "2015-02-18T15:07:10",
> "uid" : "54867dc55b482b04da7f23d8",
> "usId" : "54867dc55b482b04da7f23d7",
> "ut" : "2015-02-18T20:37:10",
> "act" : "productlisting",
> "st" : "2015-02-18T15:07:46",
> "Filter" : "",
> "av" : "3.0.0.0",
> "ViewType" : "SmallSingleList",
> "os" : "Windows",
> "categoryid" : "home-kitchen-curtains-blinds"
> }
>
> "properties" : {
> "uid" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "ViewType" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "usId" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "os" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "Sort" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "Filter" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "categoryid" : {
> "type" : "double"
> },
> "gt" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "ut" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "st" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "act" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "av" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> }
> }
>
>
> A sample document and the index mappings above..
>
>
> On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
>
> I don’t know without a concrete example.
> I’d say that if you map have a type number and you send "123" it could 
> work. 
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>  
> Le 19 févr. 2015 à 09:30, Anil Karaka  a écrit :
>
> It was my mistake, the field I was trying to do an aggregation was mapped 
> double, I assumed its a string, after seeing some sample documents with 
> strings..
>
> Why didn't es throw an error when I'm indexing docs with strings instead 
> of double..?
>
> On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
>
> Did you apply your analyzer to your mapping?
>
> David
>
> Le 19 févr. 2015 à 08:53, Anil Karaka  a écrit :
>
>
> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
>
> Posted in stack over flow as well..
>
> On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>
> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
> that fields are case insensitive..
>
> I have my analyzer as below, 
>
> "index": {
> "analysis": {
> "analyzer": { // Custom Analyzer with keyword tokenizer and 
> lowercase filter, same as not_analyzed but case insensitive
> "case_insensitive_keyword_analyzer": {
> "tokenizer": "keyword",
> "filter": "lowercase"
> }
> }
> }
> }
>
> But when I'm trying to do term aggregation over a field with strings analyzed 
> as above, I'm getting this error..
>
> {
> "error" 
> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>  cannot be cast to 
> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
> "status" : 500
> }
>
> Are there additional settings that I have to update in my custom analyzer for 
> my terms aggregation to work..?
>
>
> The better question is I want a custom

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread David Pilato

I think you are doing something wrong.

DELETE index
PUT index
{
  "mappings": {
"doc": {
  "properties": {
"foo": {
  "type": "double"
}
  }
}
  }
}
PUT index/doc/1
{
  "foo": "bar"
}

gives:

{
   "error": "MapperParsingException[failed to parse [foo]]; nested: 
NumberFormatException[For input string: \"bar\"]; ",
   "status": 400
}

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 19 févr. 2015 à 09:39, Anil Karaka  a écrit :
> 
> "_source" : {
> "Sort" : "",
> "gt" : "2015-02-18T15:07:10",
> "uid" : "54867dc55b482b04da7f23d8",
> "usId" : "54867dc55b482b04da7f23d7",
> "ut" : "2015-02-18T20:37:10",
> "act" : "productlisting",
> "st" : "2015-02-18T15:07:46",
> "Filter" : "",
> "av" : "3.0.0.0",
> "ViewType" : "SmallSingleList",
> "os" : "Windows",
> "categoryid" : "home-kitchen-curtains-blinds"
> }
> 
> "properties" : {
> "uid" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "ViewType" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "usId" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "os" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "Sort" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "Filter" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "categoryid" : {
> "type" : "double"
> },
> "gt" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "ut" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "st" : {
> "format" : "dateOptionalTime",
> "type" : "date"
> },
> "act" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> },
> "av" : {
> "analyzer" : "case_insensitive_keyword_analyzer",
> "type" : "string"
> }
> }
> 
> 
> A sample document and the index mappings above..
> 
> 
> On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
> I don’t know without a concrete example.
> I’d say that if you map have a type number and you send "123" it could work. 
> 
> -- 
> David Pilato | Technical Advocate | Elasticsearch.com 
> 
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
> 
> 
> 
>> Le 19 févr. 2015 à 09:30, Anil Karaka > a 
>> écrit :
>> 
>> It was my mistake, the field I was trying to do an aggregation was mapped 
>> double, I assumed its a string, after seeing some sample documents with 
>> strings..
>> 
>> Why didn't es throw an error when I'm indexing docs with strings instead of 
>> double..?
>> 
>> On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
>> Did you apply your analyzer to your mapping?
>> 
>> David
>> 
>> Le 19 févr. 2015 à 08:53, Anil Karaka > a écrit :
>> 
>>> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
>>>  
>>> 
>>> 
>>> Posted in stack over flow as well..
>>> 
>>> On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>>> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
>>> that fields are case insensitive..
>>> 
>>> I have my analyzer as below, 
>>> 
>>> "index": {
>>> "analysis": {
>>> "analyzer": { // Custom Analyzer with keyword tokenizer and 
>>> lowercase filter, same as not_analyzed but case insensitive
>>> "case_insensitive_keyword_analyzer": {
>>> "tokenizer": "keyword",
>>> "filter": "lowercase"
>>> }
>>> }
>>> }
>>> }
>>> 
>>> But when I'm trying to do term aggregation over a field with strings 
>>> analyzed as above, I'm getting this error..
>>> 
>>> {
>>> "error" 
>>> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>>>  cannot be cast to 
>>> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
>>> "status" : 500
>>> }
>>> 
>>> Are there additional settings that I have to update in my custom analyzer 
>>> for my terms aggregation to work..?
>>> 
>>> 
>>> The better question is I want a custom analyzer that does everything 
>>> similar to not_analyzed but is case insensitive.. How do I achieve that?
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearc...@googlegroups.com <>.
>>> To view this discussio

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread Anil Karaka

"_source" : {
"Sort" : "",
"gt" : "2015-02-18T15:07:10",
"uid" : "54867dc55b482b04da7f23d8",
"usId" : "54867dc55b482b04da7f23d7",
"ut" : "2015-02-18T20:37:10",
"act" : "productlisting",
"st" : "2015-02-18T15:07:46",
"Filter" : "",
"av" : "3.0.0.0",
"ViewType" : "SmallSingleList",
"os" : "Windows",
"categoryid" : "home-kitchen-curtains-blinds"
}

"properties" : {
"uid" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
},
"ViewType" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
},
"usId" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
},
"os" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
},
"Sort" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
},
"Filter" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
},
"categoryid" : {
"type" : "double"
},
"gt" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"ut" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"st" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"act" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
},
"av" : {
"analyzer" : "case_insensitive_keyword_analyzer",
"type" : "string"
}
}


A sample document and the index mappings above..


On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
>
> I don’t know without a concrete example.
> I’d say that if you map have a type number and you send "123" it could 
> work. 
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>  
> Le 19 févr. 2015 à 09:30, Anil Karaka > 
> a écrit :
>
> It was my mistake, the field I was trying to do an aggregation was mapped 
> double, I assumed its a string, after seeing some sample documents with 
> strings..
>
> Why didn't es throw an error when I'm indexing docs with strings instead 
> of double..?
>
> On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
>>
>> Did you apply your analyzer to your mapping?
>>
>> David
>>
>> Le 19 févr. 2015 à 08:53, Anil Karaka  a écrit :
>>
>>
>> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
>>
>> Posted in stack over flow as well..
>>
>> On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>>>
>>> I wanted a custom analyzer that behaves exactly like not_analyzed, 
>>> except that fields are case insensitive..
>>>
>>> I have my analyzer as below, 
>>>
>>> "index": {
>>> "analysis": {
>>> "analyzer": { // Custom Analyzer with keyword tokenizer and 
>>> lowercase filter, same as not_analyzed but case insensitive
>>> "case_insensitive_keyword_analyzer": {
>>> "tokenizer": "keyword",
>>> "filter": "lowercase"
>>> }
>>> }
>>> }
>>> }
>>>
>>> But when I'm trying to do term aggregation over a field with strings 
>>> analyzed as above, I'm getting this error..
>>>
>>> {
>>> "error" 
>>> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>>>  cannot be cast to 
>>> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
>>> "status" : 500
>>> }
>>>
>>> Are there additional settings that I have to update in my custom analyzer 
>>> for my terms aggregation to work..?
>>>
>>>
>>> The better question is I want a custom analyzer that does everything 
>>> similar to not_analyzed but is case insensitive.. How do I achieve that?
>>>
>>>
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optou

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread David Pilato

I don’t know without a concrete example.
I’d say that if you map have a type number and you send "123" it could work. 

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 19 févr. 2015 à 09:30, Anil Karaka  a écrit :
> 
> It was my mistake, the field I was trying to do an aggregation was mapped 
> double, I assumed its a string, after seeing some sample documents with 
> strings..
> 
> Why didn't es throw an error when I'm indexing docs with strings instead of 
> double..?
> 
> On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
> Did you apply your analyzer to your mapping?
> 
> David
> 
> Le 19 févr. 2015 à 08:53, Anil Karaka > a 
> écrit :
> 
>> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
>>  
>> 
>> 
>> Posted in stack over flow as well..
>> 
>> On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
>> that fields are case insensitive..
>> 
>> I have my analyzer as below, 
>> 
>> "index": {
>> "analysis": {
>> "analyzer": { // Custom Analyzer with keyword tokenizer and 
>> lowercase filter, same as not_analyzed but case insensitive
>> "case_insensitive_keyword_analyzer": {
>> "tokenizer": "keyword",
>> "filter": "lowercase"
>> }
>> }
>> }
>> }
>> 
>> But when I'm trying to do term aggregation over a field with strings 
>> analyzed as above, I'm getting this error..
>> 
>> {
>> "error" 
>> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>>  cannot be cast to 
>> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
>> "status" : 500
>> }
>> 
>> Are there additional settings that I have to update in my custom analyzer 
>> for my terms aggregation to work..?
>> 
>> 
>> The better question is I want a custom analyzer that does everything similar 
>> to not_analyzed but is case insensitive.. How do I achieve that?
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com
>>  
>> .
>> For more options, visit https://groups.google.com/d/optout 
>> .
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9B7CB626-45FA-4856-B735-8CD6912B7FBD%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread Anil Karaka

It was my mistake, the field I was trying to do an aggregation was mapped 
double, I assumed its a string, after seeing some sample documents with 
strings..

Why didn't es throw an error when I'm indexing docs with strings instead of 
double..?

On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
>
> Did you apply your analyzer to your mapping?
>
> David
>
> Le 19 févr. 2015 à 08:53, Anil Karaka > 
> a écrit :
>
>
> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
>
> Posted in stack over flow as well..
>
> On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>>
>> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
>> that fields are case insensitive..
>>
>> I have my analyzer as below, 
>>
>> "index": {
>> "analysis": {
>> "analyzer": { // Custom Analyzer with keyword tokenizer and 
>> lowercase filter, same as not_analyzed but case insensitive
>> "case_insensitive_keyword_analyzer": {
>> "tokenizer": "keyword",
>> "filter": "lowercase"
>> }
>> }
>> }
>> }
>>
>> But when I'm trying to do term aggregation over a field with strings 
>> analyzed as above, I'm getting this error..
>>
>> {
>> "error" 
>> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>>  cannot be cast to 
>> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
>> "status" : 500
>> }
>>
>> Are there additional settings that I have to update in my custom analyzer 
>> for my terms aggregation to work..?
>>
>>
>> The better question is I want a custom analyzer that does everything similar 
>> to not_analyzed but is case insensitive.. How do I achieve that?
>>
>>
>>
>>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations failing on fields with custom analyzer..

2015-02-19 Thread David Pilato

Did you apply your analyzer to your mapping?

David

> Le 19 févr. 2015 à 08:53, Anil Karaka  a écrit :
> 
> http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
> 
> Posted in stack over flow as well..
> 
>> On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
>> that fields are case insensitive..
>> 
>> I have my analyzer as below, 
>> 
>> "index": {
>> "analysis": {
>> "analyzer": { // Custom Analyzer with keyword tokenizer and 
>> lowercase filter, same as not_analyzed but case insensitive
>> "case_insensitive_keyword_analyzer": {
>> "tokenizer": "keyword",
>> "filter": "lowercase"
>> }
>> }
>> }
>> }
>> 
>> But when I'm trying to do term aggregation over a field with strings 
>> analyzed as above, I'm getting this error..
>> 
>> {
>> "error" 
>> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>>  cannot be cast to 
>> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
>> "status" : 500
>> }
>> 
>> Are there additional settings that I have to update in my custom analyzer 
>> for my terms aggregation to work..?
>> 
>> 
>> The better question is I want a custom analyzer that does everything similar 
>> to not_analyzed but is case insensitive.. How do I achieve that?
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/492932A0-CBC0-497B-A9D8-C6D707DC09B6%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations failing on fields with custom analyzer..

2015-02-18 Thread Anil Karaka

http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

Posted in stack over flow as well..

On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
>
> I wanted a custom analyzer that behaves exactly like not_analyzed, except 
> that fields are case insensitive..
>
> I have my analyzer as below, 
>
> "index": {
> "analysis": {
> "analyzer": { // Custom Analyzer with keyword tokenizer and 
> lowercase filter, same as not_analyzed but case insensitive
> "case_insensitive_keyword_analyzer": {
> "tokenizer": "keyword",
> "filter": "lowercase"
> }
> }
> }
> }
>
> But when I'm trying to do term aggregation over a field with strings analyzed 
> as above, I'm getting this error..
>
> {
> "error" 
> :"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
>  cannot be cast to 
> org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
> "status" : 500
> }
>
> Are there additional settings that I have to update in my custom analyzer for 
> my terms aggregation to work..?
>
>
> The better question is I want a custom analyzer that does everything similar 
> to not_analyzed but is case insensitive.. How do I achieve that?
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aggregations failing on fields with custom analyzer..

2015-02-18 Thread Anil Karaka

I wanted a custom analyzer that behaves exactly like not_analyzed, except 
that fields are case insensitive..

I have my analyzer as below, 

"index": {
"analysis": {
"analyzer": { // Custom Analyzer with keyword tokenizer and 
lowercase filter, same as not_analyzed but case insensitive
"case_insensitive_keyword_analyzer": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}

But when I'm trying to do term aggregation over a field with strings analyzed 
as above, I'm getting this error..

{
"error" 
:"ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
 cannot be cast to 
org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket]",
"status" : 500
}

Are there additional settings that I have to update in my custom analyzer for 
my terms aggregation to work..?


The better question is I want a custom analyzer that does everything similar to 
not_analyzed but is case insensitive.. How do I achieve that?



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c657449-1279-4813-9e65-262cb81e114f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch aggregations for analytics

2015-02-18 Thread Itai Yaffe

Hey,
We have an Elasticsearch cluster in production with 20 nodes which hold a 
few TBs of data and loading millions of documents a day.
We use Elasticsearch for analytics purposes and the main thing we're 
interested is counting unique users.

We started the production search with Elasticsearch 0.9.X, when there was 
no cardinality aggregation, therefor we were bound to create the document 
structure as seen below.
Most of our queries are looking for the unique users count based on a date 
range and specific segments.
Some of our analytic UI screens require executing hundreds of queries in 
parallel and one even requires thousands of queries.

When migrating to V1.4, we hoped to start using the aggregation feature, 
but even with the doc_values enabled, we experience aggregation time of 
*minutes*...
We're running on c3.8xlarge EC2 instances with 60GB RAM, of which 30GB are 
allocated to ES heap.
We have 6 indexes with 2 replicas each, each index has 20 shards.
Each aggregation/query is performed against a single index (see aggregation 
example below).

Has anyone dealt with such use cases? 

Thanks!

*Document structure* :
{
 "user": {
"_ttl": {
   "enabled": true
},
"properties": {
   "events": {
  "type": "nested",
  "properties": {
 "event_time": {
"type": "date",
"format": "dateOptionalTime",
"doc_values" : true
 },
 "segments": {
"properties": {
   "segment": {
  "type": "string",
  "index": "not_analyzed",
  "doc_values" : true
   }
}
 }
  }
   }
}
 }
}

For example :
{
  "_index": "...",
  "_type": "...",
  "_id": "...",
  "_version": 1,
  "_score": 1,
  "_source": {
"events": [
  {
"event_time": "2014-11-03",
"segments": [
  {
"segment": "ALICE"
  },
  {
"segment": "BOB"
  }
]
  },
  {
"event_time": "2014-11-04",
"segments": [
  {
"segment": "RON"
  },
  {
"segment": "YULA"
  }
]
  }
]
  }
}


*Aggegation example* :
{
"size": 0,
"query": {
"nested": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
    "events.event_time": {
"from": "2014-11-17",
"to": "2014-11-24",
"include_lower": true,
"include_upper": true
}
}
}
]
}
}
}
},
"path": "events"
}
},
"aggregations": {
"nested": {
"nested": {
"path": "events"
},
"aggregations": {
"segments": {
"terms": {
"field": "events.segments.segment",
"size": 0
},
"aggregations": {
"uu": {
"reverse_nested": {}
}
}
}
}
}
}
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/acbc3022-8845-4170-999d-d0b2bc9dfeb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Combining two aggregations to get term percentage

2015-02-17 Thread Jari Bakken

Yes!

If I have to do the division on my own I might as well stick with the two
aggregations, AFAICT.

But if it was available as a scoring heuristic I could effectively use {size:
N} so I don’t have to fetch the full set of countries to do this
calculation.

I’ve opened a feature request here
<https://github.com/elasticsearch/elasticsearch/issues/9720>.



On Tue, Feb 17, 2015 at 10:52 AM, Mark Harwood <
mark.harw...@elasticsearch.com> wrote:

> You can choose to ignore the score and compute your own by dividing
> doc_count by bg_count.
>

> Your post has made me think we should add this more easily explainable
> metric as one of the scoring heuristics we offer for this aggregation.
>
> On Tuesday, February 17, 2015 at 10:44:12 AM UTC, Jari Bakken wrote:
>>
>> Thanks Mark!
>>
>> I've been planning to look into `significant_terms`, but didn't know it
>> could help me with this. I'm a bit concerned that a too clever scoring
>> could be hard to explain to users, but I'll give it a shot.
>>
>>
>> On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood > com> wrote:
>>
>>> Nice to see someone taking the trouble to put their stats in context.
>>> Drives me nuts every time I see the equivalent of this:
>>> http://xkcd.com/1138/
>>>
>>> So we have a feature that does some of what you are after - it's called
>>> the "significant_terms" aggregation.
>>> Your query would look like this:
>>> {
>>> "query" :
>>> {
>>>  "match" : {
>>> "text": "foo"
>>> }
>>> },
>>> "aggs":{
>>> "keywords":{
>>> "significant_terms":{
>>> "field":"country",
>>> "size":100
>>> }
>>> }
>>> }
>>> }
>>>
>>> What you get back are buckets for each country with a doc_count that
>>> represents how many "foo" documents there were in that country and a
>>> background count called "bg_count" which is how many docs (foo and non foo)
>>> came from that country. Selections are ranked using a score that is
>>> returned and which is more nuanced than a straight doc_count/bg_count
>>> percentage. In practice we find prioritizing selections solely by a
>>> percentage measure can skew results towards very rare terms (in your case v
>>> small countries) that have few data samples and so can more easily achieve
>>> high-scoring percentages. Instead, we offer a variety of scoring heuristics
>>> which place a different emphasis on popular vs rare when it comes to
>>> ranking: (see https://twitter.com/elasticmark/status/513320986956292096
>>> )
>>>
>>> Cheers
>>> Mark
>>>
>>> On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm looking for a way to have Elasticsearch calculate the percentage of
>>>> docs that match a query *within* a terms aggregation.
>>>> That is, given two aggregations where one is filtered and the other is
>>>> not:
>>>>
>>>> {
>>>> aggregations: {
>>>> countries: {
>>>> filter: {
>>>> query: {
>>>> query_string: {
>>>> default_field: "description",
>>>> query: "foo"
>>>> }
>>>> }
>>>> },
>>>> aggregations: {
>>>> filteredCountries: {
>>>> terms: { field: "country" }
>>>> }
>>>> }
>>>> },
>>>> totalCountries: {
>>>> terms: { field: "countries" }
>>>> }
>>>> },
>>>> size: 0
>>>> }
>>>>
>>>> Let's say the totalCountries buckets are:
>>>>
>>>> "buckets": [
>>>> {
>>>> "key": "USA",
>>>> "doc_count": 100
>>>> },
>>>> {
>>>> "key": "UK",
>>>> "doc_count": 50
>>>> }
>>>> ]
>>>&

Re: Combining two aggregations to get term percentage

2015-02-17 Thread Mark Harwood

You can choose to ignore the score and compute your own by dividing 
doc_count by bg_count.

Your post has made me think we should add this more easily explainable 
metric as one of the scoring heuristics we offer for this aggregation.

On Tuesday, February 17, 2015 at 10:44:12 AM UTC, Jari Bakken wrote:
>
> Thanks Mark! 
>
> I've been planning to look into `significant_terms`, but didn't know it 
> could help me with this. I'm a bit concerned that a too clever scoring 
> could be hard to explain to users, but I'll give it a shot.
>
>
> On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood  > wrote:
>
>> Nice to see someone taking the trouble to put their stats in context.  
>> Drives me nuts every time I see the equivalent of this: 
>> http://xkcd.com/1138/
>>
>> So we have a feature that does some of what you are after - it's called 
>> the "significant_terms" aggregation.
>> Your query would look like this:
>> {
>> "query" :
>> {
>>  "match" : {
>> "text": "foo"
>> }
>> },
>> "aggs":{
>> "keywords":{
>> "significant_terms":{
>> "field":"country",
>> "size":100
>> }
>> }
>> }
>> }
>>
>> What you get back are buckets for each country with a doc_count that 
>> represents how many "foo" documents there were in that country and a 
>> background count called "bg_count" which is how many docs (foo and non foo) 
>> came from that country. Selections are ranked using a score that is 
>> returned and which is more nuanced than a straight doc_count/bg_count 
>> percentage. In practice we find prioritizing selections solely by a 
>> percentage measure can skew results towards very rare terms (in your case v 
>> small countries) that have few data samples and so can more easily achieve 
>> high-scoring percentages. Instead, we offer a variety of scoring heuristics 
>> which place a different emphasis on popular vs rare when it comes to 
>> ranking: (see https://twitter.com/elasticmark/status/513320986956292096 )
>>
>> Cheers
>> Mark
>>
>> On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no 
>> wrote:
>>>
>>> Hi,
>>>
>>> I'm looking for a way to have Elasticsearch calculate the percentage of 
>>> docs that match a query *within* a terms aggregation. 
>>> That is, given two aggregations where one is filtered and the other is 
>>> not:
>>>
>>> {
>>> aggregations: {
>>> countries: {
>>> filter: {   
>>> query: {
>>> query_string: {
>>> default_field: "description",
>>> query: "foo"
>>> }
>>> }
>>> },
>>> aggregations: { 
>>> filteredCountries: { 
>>> terms: { field: "country" }
>>> }
>>> }
>>> },
>>> totalCountries: {
>>> terms: { field: "countries" }
>>> }
>>> },
>>> size: 0
>>> }
>>>
>>> Let's say the totalCountries buckets are:
>>>
>>> "buckets": [
>>> {
>>> "key": "USA",
>>> "doc_count": 100
>>> },
>>> {
>>> "key": "UK",
>>> "doc_count": 50
>>> }
>>> ]
>>>
>>>
>>> and the filteredCountries buckets are: 
>>>
>>> "buckets": [
>>> {
>>> "key": "USA",
>>> "doc_count": 10
>>> },
>>> {
>>> "key": "UK",
>>> "doc_count": 25
>>> }
>>> ]
>>>
>>>
>>> Is there a way to get a response that returns filteredCountries as 
>>> percentages of totalCountries? I.e. something like:
>>>
>>> [
>>> {
>>> "key": "USA",
>>> "percent": 10
>>> },
>>> {
>>> "key": "UK",
>>> "percent": 50
>>> }
>>> ]
>>>
>>> Thanks!
>>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/1ojltqSRdhA/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/efc841d3-7c1a-4f8f-afa2-2f6474261085%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Combining two aggregations to get term percentage

2015-02-17 Thread Jari Bakken

Thanks Mark!

I've been planning to look into `significant_terms`, but didn't know it
could help me with this. I'm a bit concerned that a too clever scoring
could be hard to explain to users, but I'll give it a shot.


On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood <
mark.harw...@elasticsearch.com> wrote:

> Nice to see someone taking the trouble to put their stats in context.
> Drives me nuts every time I see the equivalent of this:
> http://xkcd.com/1138/
>
> So we have a feature that does some of what you are after - it's called
> the "significant_terms" aggregation.
> Your query would look like this:
> {
> "query" :
> {
>  "match" : {
> "text": "foo"
> }
> },
> "aggs":{
> "keywords":{
> "significant_terms":{
> "field":"country",
> "size":100
> }
> }
> }
> }
>
> What you get back are buckets for each country with a doc_count that
> represents how many "foo" documents there were in that country and a
> background count called "bg_count" which is how many docs (foo and non foo)
> came from that country. Selections are ranked using a score that is
> returned and which is more nuanced than a straight doc_count/bg_count
> percentage. In practice we find prioritizing selections solely by a
> percentage measure can skew results towards very rare terms (in your case v
> small countries) that have few data samples and so can more easily achieve
> high-scoring percentages. Instead, we offer a variety of scoring heuristics
> which place a different emphasis on popular vs rare when it comes to
> ranking: (see https://twitter.com/elasticmark/status/513320986956292096 )
>
> Cheers
> Mark
>
> On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no
> wrote:
>>
>> Hi,
>>
>> I'm looking for a way to have Elasticsearch calculate the percentage of
>> docs that match a query *within* a terms aggregation.
>> That is, given two aggregations where one is filtered and the other is
>> not:
>>
>> {
>> aggregations: {
>> countries: {
>> filter: {
>> query: {
>> query_string: {
>> default_field: "description",
>> query: "foo"
>> }
>> }
>> },
>> aggregations: {
>> filteredCountries: {
>> terms: { field: "country" }
>> }
>> }
>> },
>> totalCountries: {
>> terms: { field: "countries" }
>> }
>> },
>> size: 0
>> }
>>
>> Let's say the totalCountries buckets are:
>>
>> "buckets": [
>> {
>> "key": "USA",
>> "doc_count": 100
>> },
>> {
>> "key": "UK",
>> "doc_count": 50
>> }
>> ]
>>
>>
>> and the filteredCountries buckets are:
>>
>> "buckets": [
>> {
>> "key": "USA",
>> "doc_count": 10
>> },
>> {
>> "key": "UK",
>> "doc_count": 25
>> }
>> ]
>>
>>
>> Is there a way to get a response that returns filteredCountries as
>> percentages of totalCountries? I.e. something like:
>>
>> [
>> {
>> "key": "USA",
>> "percent": 10
>> },
>> {
>> "key": "UK",
>> "percent": 50
>> }
>> ]
>>
>> Thanks!
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/1ojltqSRdhA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP4LNbgBjhXyB3rXUPD-nfOg89MsUOLiNSLJtRO78F5WHH9vxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Combining two aggregations to get term percentage

2015-02-17 Thread Mark Harwood

Nice to see someone taking the trouble to put their stats in context. 
 Drives me nuts every time I see the equivalent of this: 
http://xkcd.com/1138/

So we have a feature that does some of what you are after - it's called the 
"significant_terms" aggregation.
Your query would look like this:
{
"query" :
{
 "match" : {
"text": "foo"
}
},
"aggs":{
"keywords":{
"significant_terms":{
"field":"country",
"size":100
}
}
}
}

What you get back are buckets for each country with a doc_count that 
represents how many "foo" documents there were in that country and a 
background count called "bg_count" which is how many docs (foo and non foo) 
came from that country. Selections are ranked using a score that is 
returned and which is more nuanced than a straight doc_count/bg_count 
percentage. In practice we find prioritizing selections solely by a 
percentage measure can skew results towards very rare terms (in your case v 
small countries) that have few data samples and so can more easily achieve 
high-scoring percentages. Instead, we offer a variety of scoring heuristics 
which place a different emphasis on popular vs rare when it comes to 
ranking: (see https://twitter.com/elasticmark/status/513320986956292096 )

Cheers
Mark

On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no wrote:
>
> Hi,
>
> I'm looking for a way to have Elasticsearch calculate the percentage of 
> docs that match a query *within* a terms aggregation. 
> That is, given two aggregations where one is filtered and the other is not:
>
> {
> aggregations: {
> countries: {
> filter: {   
> query: {
> query_string: {
> default_field: "description",
> query: "foo"
> }
> }
> },
> aggregations: { 
> filteredCountries: { 
> terms: { field: "country" }
> }
> }
> },
> totalCountries: {
> terms: { field: "countries" }
> }
> },
> size: 0
> }
>
> Let's say the totalCountries buckets are:
>
> "buckets": [
> {
> "key": "USA",
> "doc_count": 100
> },
> {
> "key": "UK",
> "doc_count": 50
> }
> ]
>
>
> and the filteredCountries buckets are: 
>
> "buckets": [
> {
> "key": "USA",
> "doc_count": 10
> },
> {
> "key": "UK",
> "doc_count": 25
> }
> ]
>
>
> Is there a way to get a response that returns filteredCountries as 
> percentages of totalCountries? I.e. something like:
>
> [
> {
> "key": "USA",
> "percent": 10
> },
> {
> "key": "UK",
> "percent": 50
> }
> ]
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Combining two aggregations to get term percentage

2015-02-16 Thread jari

Hi,

I'm looking for a way to have Elasticsearch calculate the percentage of 
docs that match a query *within* a terms aggregation. 
That is, given two aggregations where one is filtered and the other is not:

{
aggregations: {
countries: {
filter: {   
query: {
query_string: {
default_field: "description",
query: "foo"
}
}
    },
aggregations: { 
filteredCountries: { 
terms: { field: "country" }
}
}
},
totalCountries: {
terms: { field: "countries" }
}
},
size: 0
}

Let's say the totalCountries buckets are:

"buckets": [
{
"key": "USA",
"doc_count": 100
},
{
"key": "UK",
"doc_count": 50
}
]


and the filteredCountries buckets are: 

"buckets": [
{
"key": "USA",
"doc_count": 10
},
{
"key": "UK",
"doc_count": 25
}
]


Is there a way to get a response that returns filteredCountries as 
percentages of totalCountries? I.e. something like:

[
{
"key": "USA",
"percent": 10
},
{
"key": "UK",
"percent": 50
}
]

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8bbdff97-e2a0-415e-ba4f-f418a279be27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Large results sets and paging for Aggregations

2015-02-11 Thread Mark Harwood

>
> 1.) Would I assume that as my document count would increase, the time for 
> aggregation calculation would as well increase? 
>

Yes - documents are processed serially through the tree but it happens 
really quickly and each shard is doing this at the same time to produce its 
summary which is the key to scaling if things start slowing.

> 2.) How would I relate this analogy with sub aggregations.
>

Each row of pins in the bean machine represents another decision point for 
direction of travel - this is the equivalent of sub aggregation. The 
difference is in the Bean machine each layer can only go in a choice of 2 
directions - the ball goes left or right around a pin. In your first layer 
of aggregation there are not 2 but 5k choices of direction - one bucket for 
each member. Each member bucket is further broken down by 12 months. 

> My observation says that as you increase the number of child aggregations, 
> so it increases the execution time along with memory utilization. What 
> happens in case of sub aggregations?
>

Hopefully the previous comment should have made that clear. Each sub agg 
represents an additional decision point that has to be negotiated by each 
document - consider direction based on member ID, then next direction based 
on month.

> 3.) I didn't get your last statement:
>  "There is however a fixed overhead for all queries which *is* a 
> function of number of docs and that is the Field Data cache required to 
> hold the dates/member IDs in RAM - if this becomes a problem then you may 
> want to look at on-disk alternative structure in the form of "DocValues"."
>

We are fast at routing docs through the agg tree because we can lookup 
things like memberID for each matching doc really quickly and then 
determine the appropriate direction of travel. We rely on these look-up 
stores to be pre-warmed (i.e. held in Field Data arrays in the JVM or 
cached by the file system when using DocValues disk-based equivalents) to 
make our queries fast.
They are a fixed cost shared by all queries that make use of them.

>  
>  4.) Off the topic, but I guess best to ask it here since we are talking 
> about it. :) - DocValues - Since it was introduced in 1.0.0 and most of our 
> mapping was defined in ES 0.9, can I change the mapping of existing fields 
> now? Might be I can take this conversation in another thread but would love 
> to hear about 1-3 points. You made this thread very interesting for me.
>

I recommend you shift that off to another topic.

>
> Thanks
> Piyush 
>
>
>
> On Wednesday, 11 February 2015 15:12:37 UTC+5:30, Mark Harwood wrote:
>>
>> 5k doesn't sound  too scary.
>>
>> Think of the aggs tree like a "Bean Machine" [1] - one of those wooden 
>> boards with pins arranged on it like a christmas tree and you drop balls at 
>> the top of the board and they rattle down a choice of path to the bottom.
>> In the case of aggs, your buckets are the pins and documents are the balls
>>
>> The memory requirement for processing the agg tree is typically the 
>> number of pins, not the number of balls you drop into the tree as these 
>> just fall out of the bottom of the tree.
>> So in your case it is 5k members multiplied by 12 months each = 60k 
>> unique buckets, each of which will maintain a counter of how many docs pass 
>> through that point. So you could pass millions or billions of docs through 
>> and the working memory requirement for the query would be the same.
>> There is however a fixed overhead for all queries which *is* a function 
>> of number of docs and that is the Field Data cache required to hold the 
>> dates/member IDs in RAM - if this becomes a problem then you may want to 
>> look at on-disk alternative structure in the form of "DocValues".
>>
>> Hope that helps.
>>
>> [1] http://en.wikipedia.org/wiki/Bean_machine
>>
>> On Wednesday, February 11, 2015 at 7:04:04 AM UTC, piyush goyal wrote:
>>>
>>> Hi Mark,
>>>
>>> Before getting into queries, here is a little bit info about the project:
>>>
>>> 1.) A community where members keep on increasing, decreasing and 
>>> changing. Maintained in a different type.
>>> 2.) Approximately 3K to 4K documents of data of each user inserted into 
>>> ES per month in a different type maintained by member ID.
>>> 3.) Mapping is flat, there are no nested and array type of data.
>>>
>>> Requirement:
>>>
>>> Here is a sample requirement:
>>>
>>> 1.) Getting a report against each member ID against the count of data 
>>>

Re: Large results sets and paging for Aggregations

2015-02-11 Thread piyush goyal

aah..! This seems to be the best explanation of how aggregation works. 
Thanks a ton Mark for that. :) Few other questions:

1.) Would I assume that as my document count would increase, the time for 
aggregation calculation would as well increase? Reason: Trying to figure 
out if bucket creation is at individual shard level, then document count 
would happen asynchronously at each shard level thus decreasing the 
execution time significantly. Also at shard level, as and when my document 
count increases(satisfying the criteria as per query) considering if this 
process is linear time, the execution time would increase. 

2.) How would I relate this analogy with sub aggregations. My observation 
says that as you increase the number of child aggregations, so it increases 
the execution time along with memory utilization. What happens in case of 
sub aggregations?

3.) I didn't get your last statement:
 "There is however a fixed overhead for all queries which *is* a 
function of number of docs and that is the Field Data cache required to 
hold the dates/member IDs in RAM - if this becomes a problem then you may 
want to look at on-disk alternative structure in the form of "DocValues"."

 4.) Off the topic, but I guess best to ask it here since we are talking 
about it. :) - DocValues - Since it was introduced in 1.0.0 and most of our 
mapping was defined in ES 0.9, can I change the mapping of existing fields 
now? Might be I can take this conversation in another thread but would love 
to hear about 1-3 points. You made this thread very interesting for me.

Thanks
Piyush 

On Wednesday, 11 February 2015 15:12:37 UTC+5:30, Mark Harwood wrote:
>
> 5k doesn't sound  too scary.
>
> Think of the aggs tree like a "Bean Machine" [1] - one of those wooden 
> boards with pins arranged on it like a christmas tree and you drop balls at 
> the top of the board and they rattle down a choice of path to the bottom.
> In the case of aggs, your buckets are the pins and documents are the balls
>
> The memory requirement for processing the agg tree is typically the number 
> of pins, not the number of balls you drop into the tree as these just fall 
> out of the bottom of the tree.
> So in your case it is 5k members multiplied by 12 months each = 60k unique 
> buckets, each of which will maintain a counter of how many docs pass 
> through that point. So you could pass millions or billions of docs through 
> and the working memory requirement for the query would be the same.
> There is however a fixed overhead for all queries which *is* a function 
> of number of docs and that is the Field Data cache required to hold the 
> dates/member IDs in RAM - if this becomes a problem then you may want to 
> look at on-disk alternative structure in the form of "DocValues".
>
> Hope that helps.
>
> [1] http://en.wikipedia.org/wiki/Bean_machine
>
> On Wednesday, February 11, 2015 at 7:04:04 AM UTC, piyush goyal wrote:
>>
>> Hi Mark,
>>
>> Before getting into queries, here is a little bit info about the project:
>>
>> 1.) A community where members keep on increasing, decreasing and 
>> changing. Maintained in a different type.
>> 2.) Approximately 3K to 4K documents of data of each user inserted into 
>> ES per month in a different type maintained by member ID.
>> 3.) Mapping is flat, there are no nested and array type of data.
>>
>> Requirement:
>>
>> Here is a sample requirement:
>>
>> 1.) Getting a report against each member ID against the count of data for 
>> last three month.
>> 2.) Query used to get the data is:
>>
>> {
>>   "query": {
>> "constant_score": {
>>   "filter": {
>> "bool": {
>>   "must": [
>> {"term": {
>>   "datatype": "XYZ"
>> }
>> }, {
>>   "range": {
>> "response_timestamp": {
>>   "from": "2014-11-01",
>>   "to": "2015-01-31"
>> }
>>   }
>> }
>>   ]
>> }
>>   }
>> }
>>   },"aggs": {
>> "memberIDAggs": {
>>   "terms": {
>> "field": "member_id",
>> "size": 0
>>   },"aggs": {
>> "dateHistAggs": {
>>   "date_histogram": {
>> "field": "response_timestamp",
>> "interval": "month&q

Re: Large results sets and paging for Aggregations

2015-02-11 Thread Mark Harwood

5k doesn't sound  too scary.

Think of the aggs tree like a "Bean Machine" [1] - one of those wooden 
boards with pins arranged on it like a christmas tree and you drop balls at 
the top of the board and they rattle down a choice of path to the bottom.
In the case of aggs, your buckets are the pins and documents are the balls

The memory requirement for processing the agg tree is typically the number 
of pins, not the number of balls you drop into the tree as these just fall 
out of the bottom of the tree.
So in your case it is 5k members multiplied by 12 months each = 60k unique 
buckets, each of which will maintain a counter of how many docs pass 
through that point. So you could pass millions or billions of docs through 
and the working memory requirement for the query would be the same.
There is however a fixed overhead for all queries which *is* a function of 
number of docs and that is the Field Data cache required to hold the 
dates/member IDs in RAM - if this becomes a problem then you may want to 
look at on-disk alternative structure in the form of "DocValues".

Hope that helps.

[1] http://en.wikipedia.org/wiki/Bean_machine

On Wednesday, February 11, 2015 at 7:04:04 AM UTC, piyush goyal wrote:
>
> Hi Mark,
>
> Before getting into queries, here is a little bit info about the project:
>
> 1.) A community where members keep on increasing, decreasing and changing. 
> Maintained in a different type.
> 2.) Approximately 3K to 4K documents of data of each user inserted into ES 
> per month in a different type maintained by member ID.
> 3.) Mapping is flat, there are no nested and array type of data.
>
> Requirement:
>
> Here is a sample requirement:
>
> 1.) Getting a report against each member ID against the count of data for 
> last three month.
> 2.) Query used to get the data is:
>
> {
>   "query": {
> "constant_score": {
>   "filter": {
> "bool": {
>   "must": [
> {"term": {
>   "datatype": "XYZ"
> }
> }, {
>   "range": {
> "response_timestamp": {
>   "from": "2014-11-01",
>   "to": "2015-01-31"
> }
>   }
> }
>   ]
> }
>   }
> }
>   },"aggs": {
> "memberIDAggs": {
>   "terms": {
> "field": "member_id",
> "size": 0
>   },"aggs": {
> "dateHistAggs": {
>   "date_histogram": {
> "field": "response_timestamp",
> "interval": "month"
>   }
> }
>   }
> }
>   },"size": 0
> }
>
> Now since the current member count is approximately 1K which will increase 
> to 5K in next 10 months. 5K * 4K * 3 times of documents to be used for this 
> aggregation. I guess a major hit on system. And this is only two level of 
> aggregation. Next requirement by our analyst is to get per month data into 
> three different categories. 
>
> What is the optimum solution to this problem?
>
> Regards
> Piyush
>
> On Tuesday, 10 February 2015 16:15:22 UTC+5:30, Mark Harwood wrote:
>>
>> >these kind of queries are hit more for qualitative analysis.
>>
>> Do you have any example queries? The "pay as you go" summarisation need 
>> not be about just maintaining quantities.  In the demo here [1] I derive 
>> "profile" names for people, categorizing them as "newbies", "fanboys" or 
>> "haters" based on a history of their reviewing behaviours in a marketplace. 
>>
>> >By the way, are there any other strategies suggested by ES for these 
>> kind of scenarios?
>>
>> Igor hit on one which is to use some criteria eg. date to limit the 
>> volume of what you analyze in any one query request.
>>
>> [1] 
>> http://www.elasticsearch.org/videos/entity-centric-indexing-london-meetup-sep-2014/
>>
>>
>>
>> On Tuesday, February 10, 2015 at 10:05:24 AM UTC, piyush goyal wrote:
>>>
>>> Thanks Mark. Your suggestion of "pay-as-you-go" seems amazing. But 
>>> considering the dynamics of the application, these kind of queries are hit 
>>> more for qualitative analysis. There are hundred of such queries(I am not 
>>> exaggerating) which are being hit daily by our analytic team. Keeping count 
>>> of all those qualitative checks daily and

Re: Large results sets and paging for Aggregations

2015-02-10 Thread piyush goyal

Hi Mark,

Before getting into queries, here is a little bit info about the project:

1.) A community where members keep on increasing, decreasing and changing. 
Maintained in a different type.
2.) Approximately 3K to 4K documents of data of each user inserted into ES 
per month in a different type maintained by member ID.
3.) Mapping is flat, there are no nested and array type of data.

Requirement:

Here is a sample requirement:

1.) Getting a report against each member ID against the count of data for 
last three month.
2.) Query used to get the data is:

{
  "query": {
"constant_score": {
  "filter": {
"bool": {
  "must": [
{"term": {
  "datatype": "XYZ"
}
}, {
  "range": {
"response_timestamp": {
  "from": "2014-11-01",
  "to": "2015-01-31"
}
  }
}
  ]
}
  }
}
  },"aggs": {
"memberIDAggs": {
  "terms": {
"field": "member_id",
"size": 0
  },"aggs": {
"dateHistAggs": {
  "date_histogram": {
"field": "response_timestamp",
"interval": "month"
  }
}
  }
}
  },"size": 0
}

Now since the current member count is approximately 1K which will increase 
to 5K in next 10 months. 5K * 4K * 3 times of documents to be used for this 
aggregation. I guess a major hit on system. And this is only two level of 
aggregation. Next requirement by our analyst is to get per month data into 
three different categories. 

What is the optimum solution to this problem?

Regards
Piyush

On Tuesday, 10 February 2015 16:15:22 UTC+5:30, Mark Harwood wrote:
>
> >these kind of queries are hit more for qualitative analysis.
>
> Do you have any example queries? The "pay as you go" summarisation need 
> not be about just maintaining quantities.  In the demo here [1] I derive 
> "profile" names for people, categorizing them as "newbies", "fanboys" or 
> "haters" based on a history of their reviewing behaviours in a marketplace. 
>
> >By the way, are there any other strategies suggested by ES for these kind 
> of scenarios?
>
> Igor hit on one which is to use some criteria eg. date to limit the volume 
> of what you analyze in any one query request.
>
> [1] 
> http://www.elasticsearch.org/videos/entity-centric-indexing-london-meetup-sep-2014/
>
>
>
> On Tuesday, February 10, 2015 at 10:05:24 AM UTC, piyush goyal wrote:
>>
>> Thanks Mark. Your suggestion of "pay-as-you-go" seems amazing. But 
>> considering the dynamics of the application, these kind of queries are hit 
>> more for qualitative analysis. There are hundred of such queries(I am not 
>> exaggerating) which are being hit daily by our analytic team. Keeping count 
>> of all those qualitative checks daily and maintaining them as documents is 
>> a headache itself. Addition/update/removals of these documents would cause 
>> us huge maintenance overheads. Hence was thinking of getting something of 
>> getting pagination on aggregations which would definitely help us to keep 
>> our ES memory leaks away.
>>
>> By the way, are there any other strategies suggested by ES for these kind 
>> of scenarios?
>>
>> Thanks
>>
>> On Tuesday, 10 February 2015 15:20:40 UTC+5:30, Mark Harwood wrote:
>>>
>>> > Why can't aggs be based on shard based calculations 
>>>
>>> They are. The "shard_size" setting will determine how many member 
>>> *summaries* will be returned from each shard - we won't stream each 
>>> member's thousands of related records back to a centralized point to 
>>> compute a final result. The final step is to summarise the summaries from 
>>> each shard.
>>>
>>> > if the number of members keep on increasing, day by day ES has to keep 
>>> more and more data into memory to calculate the aggs
>>>
>>> This is a different point to the one above (shard-level computation vs 
>>> memory costs). If your analysis involves summarising the behaviours of 
>>> large numbers of people over time then you may well find the cost of doing 
>>> this in a single query too high when the numbers of people are extremely 
>>> large. There is a cost to any computation and in that scenario you have 
>>> deferred all these member-

Re: Large results sets and paging for Aggregations

2015-02-10 Thread Mark Harwood

>these kind of queries are hit more for qualitative analysis.

Do you have any example queries? The "pay as you go" summarisation need not 
be about just maintaining quantities.  In the demo here [1] I derive 
"profile" names for people, categorizing them as "newbies", "fanboys" or 
"haters" based on a history of their reviewing behaviours in a marketplace. 

>By the way, are there any other strategies suggested by ES for these kind 
of scenarios?

Igor hit on one which is to use some criteria eg. date to limit the volume 
of what you analyze in any one query request.

[1] 
http://www.elasticsearch.org/videos/entity-centric-indexing-london-meetup-sep-2014/



On Tuesday, February 10, 2015 at 10:05:24 AM UTC, piyush goyal wrote:
>
> Thanks Mark. Your suggestion of "pay-as-you-go" seems amazing. But 
> considering the dynamics of the application, these kind of queries are hit 
> more for qualitative analysis. There are hundred of such queries(I am not 
> exaggerating) which are being hit daily by our analytic team. Keeping count 
> of all those qualitative checks daily and maintaining them as documents is 
> a headache itself. Addition/update/removals of these documents would cause 
> us huge maintenance overheads. Hence was thinking of getting something of 
> getting pagination on aggregations which would definitely help us to keep 
> our ES memory leaks away.
>
> By the way, are there any other strategies suggested by ES for these kind 
> of scenarios?
>
> Thanks
>
> On Tuesday, 10 February 2015 15:20:40 UTC+5:30, Mark Harwood wrote:
>>
>> > Why can't aggs be based on shard based calculations 
>>
>> They are. The "shard_size" setting will determine how many member 
>> *summaries* will be returned from each shard - we won't stream each 
>> member's thousands of related records back to a centralized point to 
>> compute a final result. The final step is to summarise the summaries from 
>> each shard.
>>
>> > if the number of members keep on increasing, day by day ES has to keep 
>> more and more data into memory to calculate the aggs
>>
>> This is a different point to the one above (shard-level computation vs 
>> memory costs). If your analysis involves summarising the behaviours of 
>> large numbers of people over time then you may well find the cost of doing 
>> this in a single query too high when the numbers of people are extremely 
>> large. There is a cost to any computation and in that scenario you have 
>> deferred all these member-summarising costs to the very last moment. A 
>> better strategy for large-scale analysis of behaviours over time is to use 
>> a "pay-as-you-go" model where you update a per-member summary document at 
>> regular intervals with batches of their related records. This shifts the 
>> bulk of the computation cost from your single query to many smaller costs 
>> when writing data. You can then perform efficient aggs or scan/scroll 
>> operations on *member* documents with pre-summarised attributes e.g. 
>> totalSpend rather than deriving these properties on-the-fly from records 
>> with a shared member ID.
>>
>>
>>
>> On Tuesday, February 10, 2015 at 7:03:17 AM UTC, piyush goyal wrote:
>>>
>>> Well, my use case says I have tens of thousands of records for each 
>>> members. I want to do a simple terms aggs on member ID. If my count of 
>>> member ID remains same throughout .. good enough, if the number of members 
>>> keep on increasing, day by day ES has to keep more and more data into 
>>> memory to calculate the aggs. Does not sound very promising. What we do is 
>>> implementation of routing to put member specific data into a particular 
>>> shard. Why can't aggs be based on shard based calculations so that I am 
>>> safe from loading tons of data into memory.
>>>
>>> Any thoughts?
>>>
>>> On Sunday, 9 November 2014 22:58:12 UTC+5:30, pulkitsinghal wrote:
>>>>
>>>> Sharing a response I received from Igor Motov:
>>>>
>>>> "scroll works only to page results. paging aggs doesn't make sense 
>>>>> since aggs are executed on the entire result set. therefore if it managed 
>>>>> to fit into the memory you should just get it. paging will mean that you 
>>>>> throw away a lot of results that were already calculated. the only way to 
>>>>> "page" is by limiting the results that you are running aggs on. for 
>>>>> example 
>>>>> if your data is sorted by date and you want to build histogram for the 
>>>>> results one date range at a time."
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b8ddcc91-a1c8-472e-b08c-f662313a042a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Large results sets and paging for Aggregations

2015-02-10 Thread piyush goyal

Thanks Mark. Your suggestion of "pay-as-you-go" seems amazing. But 
considering the dynamics of the application, these kind of queries are hit 
more for qualitative analysis. There are hundred of such queries(I am not 
exaggerating) which are being hit daily by our analytic team. Keeping count 
of all those qualitative checks daily and maintaining them as documents is 
a headache itself. Addition/update/removals of these documents would cause 
us huge maintenance overheads. Hence was thinking of getting something of 
getting pagination on aggregations which would definitely help us to keep 
our ES memory leaks away.

By the way, are there any other strategies suggested by ES for these kind 
of scenarios?

Thanks

On Tuesday, 10 February 2015 15:20:40 UTC+5:30, Mark Harwood wrote:
>
> > Why can't aggs be based on shard based calculations 
>
> They are. The "shard_size" setting will determine how many member 
> *summaries* will be returned from each shard - we won't stream each 
> member's thousands of related records back to a centralized point to 
> compute a final result. The final step is to summarise the summaries from 
> each shard.
>
> > if the number of members keep on increasing, day by day ES has to keep 
> more and more data into memory to calculate the aggs
>
> This is a different point to the one above (shard-level computation vs 
> memory costs). If your analysis involves summarising the behaviours of 
> large numbers of people over time then you may well find the cost of doing 
> this in a single query too high when the numbers of people are extremely 
> large. There is a cost to any computation and in that scenario you have 
> deferred all these member-summarising costs to the very last moment. A 
> better strategy for large-scale analysis of behaviours over time is to use 
> a "pay-as-you-go" model where you update a per-member summary document at 
> regular intervals with batches of their related records. This shifts the 
> bulk of the computation cost from your single query to many smaller costs 
> when writing data. You can then perform efficient aggs or scan/scroll 
> operations on *member* documents with pre-summarised attributes e.g. 
> totalSpend rather than deriving these properties on-the-fly from records 
> with a shared member ID.
>
>
>
> On Tuesday, February 10, 2015 at 7:03:17 AM UTC, piyush goyal wrote:
>>
>> Well, my use case says I have tens of thousands of records for each 
>> members. I want to do a simple terms aggs on member ID. If my count of 
>> member ID remains same throughout .. good enough, if the number of members 
>> keep on increasing, day by day ES has to keep more and more data into 
>> memory to calculate the aggs. Does not sound very promising. What we do is 
>> implementation of routing to put member specific data into a particular 
>> shard. Why can't aggs be based on shard based calculations so that I am 
>> safe from loading tons of data into memory.
>>
>> Any thoughts?
>>
>> On Sunday, 9 November 2014 22:58:12 UTC+5:30, pulkitsinghal wrote:
>>>
>>> Sharing a response I received from Igor Motov:
>>>
>>> "scroll works only to page results. paging aggs doesn't make sense since 
>>>> aggs are executed on the entire result set. therefore if it managed to fit 
>>>> into the memory you should just get it. paging will mean that you throw 
>>>> away a lot of results that were already calculated. the only way to "page" 
>>>> is by limiting the results that you are running aggs on. for example if 
>>>> your data is sorted by date and you want to build histogram for the 
>>>> results 
>>>> one date range at a time."
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4b5fd32-3ef7-4026-846e-5f7d388bad1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Large results sets and paging for Aggregations

2015-02-10 Thread Mark Harwood

> Why can't aggs be based on shard based calculations 

They are. The "shard_size" setting will determine how many member 
*summaries* will be returned from each shard - we won't stream each 
member's thousands of related records back to a centralized point to 
compute a final result. The final step is to summarise the summaries from 
each shard.

> if the number of members keep on increasing, day by day ES has to keep 
more and more data into memory to calculate the aggs

This is a different point to the one above (shard-level computation vs 
memory costs). If your analysis involves summarising the behaviours of 
large numbers of people over time then you may well find the cost of doing 
this in a single query too high when the numbers of people are extremely 
large. There is a cost to any computation and in that scenario you have 
deferred all these member-summarising costs to the very last moment. A 
better strategy for large-scale analysis of behaviours over time is to use 
a "pay-as-you-go" model where you update a per-member summary document at 
regular intervals with batches of their related records. This shifts the 
bulk of the computation cost from your single query to many smaller costs 
when writing data. You can then perform efficient aggs or scan/scroll 
operations on *member* documents with pre-summarised attributes e.g. 
totalSpend rather than deriving these properties on-the-fly from records 
with a shared member ID.

On Tuesday, February 10, 2015 at 7:03:17 AM UTC, piyush goyal wrote:
>
> Well, my use case says I have tens of thousands of records for each 
> members. I want to do a simple terms aggs on member ID. If my count of 
> member ID remains same throughout .. good enough, if the number of members 
> keep on increasing, day by day ES has to keep more and more data into 
> memory to calculate the aggs. Does not sound very promising. What we do is 
> implementation of routing to put member specific data into a particular 
> shard. Why can't aggs be based on shard based calculations so that I am 
> safe from loading tons of data into memory.
>
> Any thoughts?
>
> On Sunday, 9 November 2014 22:58:12 UTC+5:30, pulkitsinghal wrote:
>>
>> Sharing a response I received from Igor Motov:
>>
>> "scroll works only to page results. paging aggs doesn't make sense since 
>>> aggs are executed on the entire result set. therefore if it managed to fit 
>>> into the memory you should just get it. paging will mean that you throw 
>>> away a lot of results that were already calculated. the only way to "page" 
>>> is by limiting the results that you are running aggs on. for example if 
>>> your data is sorted by date and you want to build histogram for the results 
>>> one date range at a time."
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/486fc700-a89f-473f-a6c6-4e69e862766f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Large results sets and paging for Aggregations

2015-02-09 Thread piyush goyal

Well, my use case says I have tens of thousands of records for each 
members. I want to do a simple terms aggs on member ID. If my count of 
member ID remains same throughout .. good enough, if the number of members 
keep on increasing, day by day ES has to keep more and more data into 
memory to calculate the aggs. Does not sound very promising. What we do is 
implementation of routing to put member specific data into a particular 
shard. Why can't aggs be based on shard based calculations so that I am 
safe from loading tons of data into memory.

Any thoughts?

On Sunday, 9 November 2014 22:58:12 UTC+5:30, pulkitsinghal wrote:
>
> Sharing a response I received from Igor Motov:
>
> "scroll works only to page results. paging aggs doesn't make sense since 
>> aggs are executed on the entire result set. therefore if it managed to fit 
>> into the memory you should just get it. paging will mean that you throw 
>> away a lot of results that were already calculated. the only way to "page" 
>> is by limiting the results that you are running aggs on. for example if 
>> your data is sorted by date and you want to build histogram for the results 
>> one date range at a time."
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6307a18-ea96-403d-ac02-dc37d3f2cceb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Persisting Aggregations

2015-02-03 Thread AndrewK

Thanks for the feedback Itamar: I had a feeling that that would be the case 
but the confirmation is helpful (and storing the results back in 
ES/elsewhere is not a problem).

Regards, Andrew 

Am Dienstag, 3. Februar 2015 10:52:33 UTC+1 schrieb Itamar Syn-Hershko:
>
> The Aggs Fw doesn't allow for persisting results, mainly because it is 
> targeted at real-time data that can still change, but it does support 
> caching as of 1.4. That is, if you issue the same query & aggregations 
> request again and again you will be served directly from cache, given the 
> data hasn't changed.
>
> That is to say, if you care about performance, the caching layer should be 
> the answer. If you need other things (point in time view of data, further 
> processing, etc) you will need to store the results back to ES or other 
> storage as a document.
>
> HTH
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>
> On Tue, Feb 3, 2015 at 11:21 AM, AndrewK 
> > wrote:
>
>> I've not yet used the aggregations framework, but one question that has 
>> come up recently with contacts and prospective clients is how best to 
>> persist aggregations in ElasticSearch for repeated use.
>>
>> If I have understood the documentation correctly, the aggregation 
>> framework does a pretty good job of using shard caching to make 
>> repeated-or-similar queries as efficient as possible, but it would - 
>> presumably - be even better if "static" results (i.e. which will hardly 
>> ever - or never - change) could be persisted in some way (in a dedicated 
>> index, for example). 
>>
>> Is this possible "internally" (i.e. to GET an aggregation result and POST 
>> it in one call) or would one simply have to extract the desired data and 
>> then post it oneself?
>>
>> Regards, Andrew
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2b492a47-1fa6-40f1-a14e-54ccb7fe2a0e%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/2b492a47-1fa6-40f1-a14e-54ccb7fe2a0e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d883aef-07d7-4ad1-82c4-472822cab28f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Persisting Aggregations

2015-02-03 Thread Itamar Syn-Hershko

The Aggs Fw doesn't allow for persisting results, mainly because it is
targeted at real-time data that can still change, but it does support
caching as of 1.4. That is, if you issue the same query & aggregations
request again and again you will be served directly from cache, given the
data hasn't changed.

That is to say, if you care about performance, the caching layer should be
the answer. If you need other things (point in time view of data, further
processing, etc) you will need to store the results back to ES or other
storage as a document.

HTH

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Tue, Feb 3, 2015 at 11:21 AM, AndrewK  wrote:

> I've not yet used the aggregations framework, but one question that has
> come up recently with contacts and prospective clients is how best to
> persist aggregations in ElasticSearch for repeated use.
>
> If I have understood the documentation correctly, the aggregation
> framework does a pretty good job of using shard caching to make
> repeated-or-similar queries as efficient as possible, but it would -
> presumably - be even better if "static" results (i.e. which will hardly
> ever - or never - change) could be persisted in some way (in a dedicated
> index, for example).
>
> Is this possible "internally" (i.e. to GET an aggregation result and POST
> it in one call) or would one simply have to extract the desired data and
> then post it oneself?
>
> Regards, Andrew
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2b492a47-1fa6-40f1-a14e-54ccb7fe2a0e%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/2b492a47-1fa6-40f1-a14e-54ccb7fe2a0e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zs6c4tbG-2vXYowbpcA45MTQty1i6Hquv%3DOYVYOWSp9%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Persisting Aggregations

2015-02-03 Thread AndrewK

I've not yet used the aggregations framework, but one question that has 
come up recently with contacts and prospective clients is how best to 
persist aggregations in ElasticSearch for repeated use.

If I have understood the documentation correctly, the aggregation framework 
does a pretty good job of using shard caching to make repeated-or-similar 
queries as efficient as possible, but it would - presumably - be even 
better if "static" results (i.e. which will hardly ever - or never - 
change) could be persisted in some way (in a dedicated index, for example). 

Is this possible "internally" (i.e. to GET an aggregation result and POST 
it in one call) or would one simply have to extract the desired data and 
then post it oneself?

Regards, Andrew

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b492a47-1fa6-40f1-a14e-54ccb7fe2a0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Kibana 4 - Issues with Filters Aggregations

2015-01-27 Thread renaud

Hi,

We were wondering is anyone had time looking at those issues ? Are they 
already known or should we open a github issue regarding those ?

Thanks
-- 
Renaud Delbru

On Monday, January 26, 2015 at 1:03:19 PM UTC, ren...@sindicetech.com wrote:
>
>
> 
> Hi,
>
> We tried to create panels based on Filters Aggregation on the latest 
> Kibana source, and we encountered the following problems:
>
> 1) Buckets from Filters aggregation are not displayed in Data table
>
> By exporting the raw response, we can see that the buckets are created, 
> however, nothing is displayed in the table. We then tried to use another 
> panel (pie chart or bar chart) with the same filters aggregation and the 
> buckets were displayed correctly.
>
> 2) Selecting a bucket does not affect search result
>
> On a pie or bar chart, configured with Filters Aggregation, if we click on 
> a bucket, nothing is happening - the search result is not restricted based 
> on the selected bucket
>
> 3) After selecting a bucket, dashboard ui does not display applied filters
>
> After clicking on a bucket from a chart configured with Filters 
> Aggregation, the dashboard UI does not display anymore the current applied 
> filters.
>
> See attached screen recording showing the last two problems.
>
> -- 
> Renaud Delbru
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6baa7829-c22d-40d9-b450-273a7060bf4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kibana 4 - Issues with Filters Aggregations

2015-01-26 Thread renaud




Hi,

We tried to create panels based on Filters Aggregation on the latest Kibana 
source, and we encountered the following problems:

1) Buckets from Filters aggregation are not displayed in Data table

By exporting the raw response, we can see that the buckets are created, 
however, nothing is displayed in the table. We then tried to use another 
panel (pie chart or bar chart) with the same filters aggregation and the 
buckets were displayed correctly.

2) Selecting a bucket does not affect search result

On a pie or bar chart, configured with Filters Aggregation, if we click on 
a bucket, nothing is happening - the search result is not restricted based 
on the selected bucket

3) After selecting a bucket, dashboard ui does not display applied filters

After clicking on a bucket from a chart configured with Filters 
Aggregation, the dashboard UI does not display anymore the current applied 
filters.

See attached screen recording showing the last two problems.

-- 
Renaud Delbru


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c5c6c85-0361-4a8a-9821-15b010ff4419%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to combine aggregations

2015-01-15 Thread Adrien Grand

I believe you could run a terms aggregation on the city field, and under
this terms aggregation put two sum aggregations, one for clicks and one for
displays. And finally you could derive the click rate from the sum of
clicks and displays on client side? If you are starting playing with
aggregations, I would recommend reading this blog post by Zachary Tong:
http://www.elasticsearch.org/blog/intro-to-aggregations/

On Wed, Jan 14, 2015 at 10:43 PM, Yan Georget  wrote:

> Hello,
>
> Let's imagine I am logging displays and clicks, say by cities.
> I can aggregate those by countries and I can also compute grand totals.
>
> Now I would like to compute click rates (clicks/displays) by cities,
> countries and I would also like to get a global click rate.
> How can I do this?
>
> It seems that I could use a scripted metric (I have not tried yet) but I
> would also like to expose these rates in Kibana.
>
> It is possible?
>
> Thanks in advance,
> Yan Georget
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c5356c3e-9322-4708-9c20-eed270ee57d9%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/c5356c3e-9322-4708-9c20-eed270ee57d9%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j69AGX4bH4eL%3DxP6a84oT-64Op1FqGha5iMJJZ_hzVAnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

how to combine aggregations

2015-01-14 Thread Yan Georget

Hello,

Let's imagine I am logging displays and clicks, say by cities.
I can aggregate those by countries and I can also compute grand totals.

Now I would like to compute click rates (clicks/displays) by cities, 
countries and I would also like to get a global click rate.
How can I do this?

It seems that I could use a scripted metric (I have not tried yet) but I 
would also like to expose these rates in Kibana.

It is possible?

Thanks in advance,
Yan Georget

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c5356c3e-9322-4708-9c20-eed270ee57d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Filter in nested aggregations

2015-01-14 Thread Loïc Wenkin

Hi Adrien,

Thanks a lot, it works!

Regards,
Loïc

Le mercredi 14 janvier 2015 12:06:54 UTC+1, Adrien Grand a écrit :
>
> Provided that `d` is mapped as a `nested` object, you can do it by using 
> the nested aggregation. Here are the aggregations that you would need on 
> each level
>  1. nested aggregation to be in the context of path='d'
>  2. filter aggregation on d.toto == 1
>  3. max aggregation on d.tutu
>
> On Wed, Jan 14, 2015 at 11:08 AM, Loïc Wenkin  > wrote:
>
>> Hi all,
>>
>> I was wondering if it is possible to apply aggregations only on a set of 
>> nested objects (not all nested objects presents inside a document), 
>> according to a property of this object? I think an example will help to 
>> understand.
>>
>> Let's say that you have the following documents inthe index:
>>
>> {
>>"a": 2,
>>"b": 3,
>>"c": 4,
>>"d": [
>>   {
>>  "toto": 1,
>>  "tutu": 2,
>>  "titi": 3
>>   },
>>   {
>>  "toto": 2,
>>  "tutu": 5,
>>  "titi": 6
>>   },
>>   {
>>  "toto": 3,
>>  "tutu": 11,
>>  "titi": 8
>>   },
>>   {
>>  "toto": 4,
>>  "tutu": 7,
>>  "titi": 4
>>   }
>>]
>> }
>>
>>
>>
>>
>> {
>>"a": 3,
>>"b": 4,
>>"c": 5,
>>"d": [
>>   {
>>  "toto": 1,
>>  "tutu": 10,
>>  "titi": 6
>>   },
>>   {
>>  "toto": 2,
>>  "tutu": 65,
>>  "titi": 8
>>   },
>>   {
>>  "toto": 3,
>>  "tutu": 25,
>>  "titi": 15
>>   },
>>   {
>>  "toto": 4,
>>  "tutu": 30,
>>  "titi": 45
>>   }
>>]
>> }
>>
>>
>>
>> Where "d" is indexed as nested. I would like to get the maximum value for 
>> the "tutu" property from this kind of documents, but I would like to work 
>> only on nested objects where "toto" = 1. Here, I would like to get 10 
>> (the red one in the samples), and not 65.
>>
>> Is it possible to do this kind of aggregation? Or have I to change my 
>> document structure (or index mapping or ...)? If it is possible, could you 
>> point me how to do it? I had a look at nested aggregations and filter ones, 
>> but I am not sure that it could help.
>>
>> Thank you for your replies.
>>
>> Regards,
>> Loïc
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/681f7e7f-6340-4699-88f4-bb99032e47b5%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/681f7e7f-6340-4699-88f4-bb99032e47b5%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9fecd1ff-4d6b-4d24-acd0-c7e6e457abc0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Filter in nested aggregations

2015-01-14 Thread Adrien Grand

Provided that `d` is mapped as a `nested` object, you can do it by using
the nested aggregation. Here are the aggregations that you would need on
each level
 1. nested aggregation to be in the context of path='d'
 2. filter aggregation on d.toto == 1
 3. max aggregation on d.tutu

On Wed, Jan 14, 2015 at 11:08 AM, Loïc Wenkin  wrote:

> Hi all,
>
> I was wondering if it is possible to apply aggregations only on a set of
> nested objects (not all nested objects presents inside a document),
> according to a property of this object? I think an example will help to
> understand.
>
> Let's say that you have the following documents inthe index:
>
> {
>"a": 2,
>"b": 3,
>"c": 4,
>"d": [
>   {
>  "toto": 1,
>  "tutu": 2,
>  "titi": 3
>   },
>   {
>  "toto": 2,
>  "tutu": 5,
>  "titi": 6
>   },
>   {
>  "toto": 3,
>  "tutu": 11,
>  "titi": 8
>   },
>   {
>  "toto": 4,
>  "tutu": 7,
>  "titi": 4
>   }
>]
> }
>
>
>
>
> {
>"a": 3,
>"b": 4,
>"c": 5,
>"d": [
>   {
>  "toto": 1,
>  "tutu": 10,
>  "titi": 6
>   },
>   {
>  "toto": 2,
>  "tutu": 65,
>  "titi": 8
>   },
>   {
>  "toto": 3,
>  "tutu": 25,
>  "titi": 15
>   },
>   {
>  "toto": 4,
>  "tutu": 30,
>  "titi": 45
>   }
>]
> }
>
>
>
> Where "d" is indexed as nested. I would like to get the maximum value for
> the "tutu" property from this kind of documents, but I would like to work
> only on nested objects where "toto" = 1. Here, I would like to get 10
> (the red one in the samples), and not 65.
>
> Is it possible to do this kind of aggregation? Or have I to change my
> document structure (or index mapping or ...)? If it is possible, could you
> point me how to do it? I had a look at nested aggregations and filter ones,
> but I am not sure that it could help.
>
> Thank you for your replies.
>
> Regards,
> Loïc
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/681f7e7f-6340-4699-88f4-bb99032e47b5%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/681f7e7f-6340-4699-88f4-bb99032e47b5%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6mCs-gBR8fOygLEcX98JUVxgqg2d%2B-3_V_6SBOuq3iRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Filter in nested aggregations

2015-01-14 Thread Loïc Wenkin

Hi all,

I was wondering if it is possible to apply aggregations only on a set of 
nested objects (not all nested objects presents inside a document), 
according to a property of this object? I think an example will help to 
understand.

Let's say that you have the following documents inthe index:

{
   "a": 2,
   "b": 3,
   "c": 4,
   "d": [
  {
 "toto": 1,
 "tutu": 2,
 "titi": 3
  },
  {
 "toto": 2,
 "tutu": 5,
 "titi": 6
  },
  {
 "toto": 3,
 "tutu": 11,
 "titi": 8
  },
  {
 "toto": 4,
 "tutu": 7,
 "titi": 4
  }
   ]
}




{
   "a": 3,
   "b": 4,
   "c": 5,
   "d": [
  {
 "toto": 1,
 "tutu": 10,
 "titi": 6
  },
  {
 "toto": 2,
 "tutu": 65,
 "titi": 8
  },
  {
 "toto": 3,
 "tutu": 25,
 "titi": 15
  },
  {
 "toto": 4,
 "tutu": 30,
 "titi": 45
  }
   ]
}



Where "d" is indexed as nested. I would like to get the maximum value for 
the "tutu" property from this kind of documents, but I would like to work 
only on nested objects where "toto" = 1. Here, I would like to get 10 (the 
red one in the samples), and not 65.

Is it possible to do this kind of aggregation? Or have I to change my 
document structure (or index mapping or ...)? If it is possible, could you 
point me how to do it? I had a look at nested aggregations and filter ones, 
but I am not sure that it could help.

Thank you for your replies.

Regards,
Loïc

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/681f7e7f-6340-4699-88f4-bb99032e47b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

TopHits aggregations generates exception with SearchType=Count

2015-01-12 Thread Leon Portman

Hello

I wanted to updated regarding some problem that only happens in multiple 
nodes configuration.
If there is a aggregation queries with top hits aggregation but SearchType 
is set to "count", Elasticsearch 1.4.2 will throw following exception:

Failed to deserialize response of type 
[org.elasticsearch.search.query.QuerySearchResult]


Best Regards

Leon

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2c92b03c-7b57-4ab0-b065-dbde8e2f9a5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Allow for disabled sorting on aggregations when request size is unlimited

2015-01-09 Thread Elliott Bradshaw

In cases where all aggregation results are required, sorting may not be.  
Heavy/large aggregations can be fairly CPU intensive.  To improve 
performance, it would be great to have an option to disable sorting of 
results at both the global and shard-local level.  Hash maps of results 
would be sufficient.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a709a009-e5d5-4a0b-9d8d-c2a02dcee01f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Optimize sub-aggregations when index is routed by primary bucket aggregation

2015-01-09 Thread Elliott Bradshaw

I'm not sure if anything like this is in place already, but when documents 
are routed by a given field that is later used as a primary bucket 
aggregation, all data for sub-aggregations will be shard local.  It could 
be beneficial to take advantage of that locality.  Has anyone explored or 
considered this?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/565a3aca-4c49-4b78-a4d3-d68e84caa13f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: "Aggregations" without doc-counts

2015-01-05 Thread Elliott Bradshaw

I am only running a geohash grid aggregation.  I reduce the precision 
parameter as much as I can in each case.  Any guesses on where most of the 
time is being spent?  I could dig through the source...

On Monday, January 5, 2015 9:49:01 AM UTC-5, Adrien Grand wrote:
>
> No it wouldn't. I don't have ideas about how to improve performance, are 
> you running only a geohash grid aggregation or do you also have sub 
> aggregations? Also 1 million buckets is a lot, if it would work for you to 
> decrease the value of the precision parameter, this could help with 
> performance.
>
> On Mon, Jan 5, 2015 at 1:22 PM, Elliott Bradshaw  > wrote:
>
>> Just as a thought, would setting geohash = true or geohash_prefix = true 
>> at index time improve performance?
>>
>>
>> On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:
>>>
>>> Adrian,
>>>
>>> Thanks for that.  I had a feeling that that might be the case.
>>>
>>> Any tips on improving aggregation performance.  I'm working with a 20 
>>> shard index that is loaded on a 20 node cluster.  Geohash grid aggregations 
>>> on the entire data set (with the size set to unlimited - a requirement) can 
>>> take as long as 8 seconds (and return ~ 1 million buckets).  I am very 
>>> happy with that performance, but if there are any tricks to improve it I 
>>> would be glad to do so.
>>>
>>> Thanks,
>>>
>>> Elliott
>>>
>>> On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
>>>>
>>>> Hi Eliott,
>>>>
>>>> The overhead of computing the doc counts is actually low, I don't think 
>>>> you should worry about it.
>>>>
>>>> On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw  
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm currently working on a project that visualizes geospatial data in 
>>>>> Elasticsearch.  One of the things I am doing is generating heatmaps with 
>>>>> the geohash grid aggregation.  I would like to take this to the extreme 
>>>>> case of gridding down to the individual pixel level to display raster 
>>>>> images of a data set, but I am not concerned with the total doc count of 
>>>>> each geohash.  Is there a way (or could it be implemented) where an 
>>>>> optimized aggregation could be run that simply lists the existing terms 
>>>>> (geohashes) and does not bother with aggregating their counts?  If this 
>>>>> significantly improved performance, such a feature would be very valuable.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> - Elliott Bradshaw
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%
>>>>> 40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Adrien Grand
>>>>  
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d55880c-e539-4614-a99e-77d9cede47f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: "Aggregations" without doc-counts

2015-01-05 Thread Adrien Grand

No it wouldn't. I don't have ideas about how to improve performance, are
you running only a geohash grid aggregation or do you also have sub
aggregations? Also 1 million buckets is a lot, if it would work for you to
decrease the value of the precision parameter, this could help with
performance.

On Mon, Jan 5, 2015 at 1:22 PM, Elliott Bradshaw 
wrote:

> Just as a thought, would setting geohash = true or geohash_prefix = true
> at index time improve performance?
>
>
> On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:
>>
>> Adrian,
>>
>> Thanks for that.  I had a feeling that that might be the case.
>>
>> Any tips on improving aggregation performance.  I'm working with a 20
>> shard index that is loaded on a 20 node cluster.  Geohash grid aggregations
>> on the entire data set (with the size set to unlimited - a requirement) can
>> take as long as 8 seconds (and return ~ 1 million buckets).  I am very
>> happy with that performance, but if there are any tricks to improve it I
>> would be glad to do so.
>>
>> Thanks,
>>
>> Elliott
>>
>> On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
>>>
>>> Hi Eliott,
>>>
>>> The overhead of computing the doc counts is actually low, I don't think
>>> you should worry about it.
>>>
>>> On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm currently working on a project that visualizes geospatial data in
>>>> Elasticsearch.  One of the things I am doing is generating heatmaps with
>>>> the geohash grid aggregation.  I would like to take this to the extreme
>>>> case of gridding down to the individual pixel level to display raster
>>>> images of a data set, but I am not concerned with the total doc count of
>>>> each geohash.  Is there a way (or could it be implemented) where an
>>>> optimized aggregation could be run that simply lists the existing terms
>>>> (geohashes) and does not bother with aggregating their counts?  If this
>>>> significantly improved performance, such a feature would be very valuable.
>>>>
>>>> Thanks!
>>>>
>>>> - Elliott Bradshaw
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> Adrien Grand
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7EHNTtNWqnbK-t1tECku-WDtxq2omRvOhQsw4ZLh_jsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: "Aggregations" without doc-counts

2015-01-05 Thread Elliott Bradshaw

Just as a thought, would setting geohash = true or geohash_prefix = true at 
index time improve performance?

On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:
>
> Adrian,
>
> Thanks for that.  I had a feeling that that might be the case.
>
> Any tips on improving aggregation performance.  I'm working with a 20 
> shard index that is loaded on a 20 node cluster.  Geohash grid aggregations 
> on the entire data set (with the size set to unlimited - a requirement) can 
> take as long as 8 seconds (and return ~ 1 million buckets).  I am very 
> happy with that performance, but if there are any tricks to improve it I 
> would be glad to do so.
>
> Thanks,
>
> Elliott
>
> On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
>>
>> Hi Eliott,
>>
>> The overhead of computing the doc counts is actually low, I don't think 
>> you should worry about it.
>>
>> On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw  
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently working on a project that visualizes geospatial data in 
>>> Elasticsearch.  One of the things I am doing is generating heatmaps with 
>>> the geohash grid aggregation.  I would like to take this to the extreme 
>>> case of gridding down to the individual pixel level to display raster 
>>> images of a data set, but I am not concerned with the total doc count of 
>>> each geohash.  Is there a way (or could it be implemented) where an 
>>> optimized aggregation could be run that simply lists the existing terms 
>>> (geohashes) and does not bother with aggregating their counts?  If this 
>>> significantly improved performance, such a feature would be very valuable.
>>>
>>> Thanks!
>>>
>>> - Elliott Bradshaw
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> Adrien Grand
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: "Aggregations" without doc-counts

2015-01-05 Thread Elliott Bradshaw

Adrian,

Thanks for that.  I had a feeling that that might be the case.

Any tips on improving aggregation performance.  I'm working with a 20 shard 
index that is loaded on a 20 node cluster.  Geohash grid aggregations on 
the entire data set (with the size set to unlimited - a requirement) can 
take as long as 8 seconds (and return ~ 1 million buckets).  I am very 
happy with that performance, but if there are any tricks to improve it I 
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
>
> Hi Eliott,
>
> The overhead of computing the doc counts is actually low, I don't think 
> you should worry about it.
>
> On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw  > wrote:
>
>> Hi,
>>
>> I'm currently working on a project that visualizes geospatial data in 
>> Elasticsearch.  One of the things I am doing is generating heatmaps with 
>> the geohash grid aggregation.  I would like to take this to the extreme 
>> case of gridding down to the individual pixel level to display raster 
>> images of a data set, but I am not concerned with the total doc count of 
>> each geohash.  Is there a way (or could it be implemented) where an 
>> optimized aggregation could be run that simply lists the existing terms 
>> (geohashes) and does not bother with aggregating their counts?  If this 
>> significantly improved performance, such a feature would be very valuable.
>>
>> Thanks!
>>
>> - Elliott Bradshaw
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53882b08-db93-4116-8c70-b6c1158eb178%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to implement custom aggregations or algorithms

2015-01-04 Thread cto@TCS

Thank you so much! 

On Saturday, January 3, 2015 1:23:44 AM UTC+5:30, Adrien Grand wrote:
>
> There is a way, but it is not documented as the aggregations internal API 
> is not stable (and we kind-of like having this ability to break it whenever 
> we want...) so it's very likely that upgrading your code to a new release 
> would require important refactorings. Basically you need to write a plugin 
> that will call AggregationModule.addAggregatorParser with a parser for your 
> own aggregation. You can get inspiration from an existing aggregation such 
> as the stats aggregation (in org.elasticsearch.search.aggregations.stats) 
> to get started.
>
> That said if you think that you need is common, a better option might be 
> to open an issue so that we implement it and expose it in Elasticsearch, 
> this way you would not have to maintain it yourself.
>
> On Thu, Jan 1, 2015 at 10:39 AM, cto@TCS  > wrote:
>
>> Hi,
>>
>> I have been using ElasticSearch for the past 2 months and have tried 
>> various aggregations and filtering options on my data and is very happy 
>> with the performance.
>> But, recently there is a requirement to implement some algorithms on my 
>> data set while retrieving data from ElasticSearch. This requirement is not 
>> solved by existing aggregations or filters.
>>
>> So, is there a way of implementing any algorithm or custom aggregations 
>> on the data present in ElasticSearch ???
>>
>> Thanks
>> cto@TCS
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f4a1b1c5-cad8-4ab4-a4c9-844508d6f96b%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/f4a1b1c5-cad8-4ab4-a4c9-844508d6f96b%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/be63da4f-99bd-46e3-924e-69adf6f16682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to implement custom aggregations or algorithms

2015-01-02 Thread Adrien Grand

There is a way, but it is not documented as the aggregations internal API
is not stable (and we kind-of like having this ability to break it whenever
we want...) so it's very likely that upgrading your code to a new release
would require important refactorings. Basically you need to write a plugin
that will call AggregationModule.addAggregatorParser with a parser for your
own aggregation. You can get inspiration from an existing aggregation such
as the stats aggregation (in org.elasticsearch.search.aggregations.stats)
to get started.

That said if you think that you need is common, a better option might be to
open an issue so that we implement it and expose it in Elasticsearch, this
way you would not have to maintain it yourself.

On Thu, Jan 1, 2015 at 10:39 AM, cto@TCS  wrote:

> Hi,
>
> I have been using ElasticSearch for the past 2 months and have tried
> various aggregations and filtering options on my data and is very happy
> with the performance.
> But, recently there is a requirement to implement some algorithms on my
> data set while retrieving data from ElasticSearch. This requirement is not
> solved by existing aggregations or filters.
>
> So, is there a way of implementing any algorithm or custom aggregations on
> the data present in ElasticSearch ???
>
> Thanks
> cto@TCS
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f4a1b1c5-cad8-4ab4-a4c9-844508d6f96b%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/f4a1b1c5-cad8-4ab4-a4c9-844508d6f96b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4AS9SCkU4apC-e5Q%2BqjVN3Nws33qjNzL9A9jmBXxE-jw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How to implement custom aggregations or algorithms

2015-01-01 Thread cto@TCS

Hi,

I have been using ElasticSearch for the past 2 months and have tried 
various aggregations and filtering options on my data and is very happy 
with the performance.
But, recently there is a requirement to implement some algorithms on my 
data set while retrieving data from ElasticSearch. This requirement is not 
solved by existing aggregations or filters.

So, is there a way of implementing any algorithm or custom aggregations on 
the data present in ElasticSearch ???

Thanks
cto@TCS

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f4a1b1c5-cad8-4ab4-a4c9-844508d6f96b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: "Aggregations" without doc-counts

2014-12-30 Thread Adrien Grand

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think you
should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw 
wrote:

> Hi,
>
> I'm currently working on a project that visualizes geospatial data in
> Elasticsearch.  One of the things I am doing is generating heatmaps with
> the geohash grid aggregation.  I would like to take this to the extreme
> case of gridding down to the individual pixel level to display raster
> images of a data set, but I am not concerned with the total doc count of
> each geohash.  Is there a way (or could it be implemented) where an
> optimized aggregation could be run that simply lists the existing terms
> (geohashes) and does not bother with aggregating their counts?  If this
> significantly improved performance, such a feature would be very valuable.
>
> Thanks!
>
> - Elliott Bradshaw
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45%2BgfqBk73Mfh_b6JVLcG9E7RfkE9eovPgL5kYG%3DzRug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

"Aggregations" without doc-counts

2014-12-30 Thread Elliott Bradshaw

Hi,

I'm currently working on a project that visualizes geospatial data in 
Elasticsearch.  One of the things I am doing is generating heatmaps with 
the geohash grid aggregation.  I would like to take this to the extreme 
case of gridding down to the individual pixel level to display raster 
images of a data set, but I am not concerned with the total doc count of 
each geohash.  Is there a way (or could it be implemented) where an 
optimized aggregation could be run that simply lists the existing terms 
(geohashes) and does not bother with aggregating their counts?  If this 
significantly improved performance, such a feature would be very valuable.

Thanks!

- Elliott Bradshaw

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aggregations Parallelism Level

2014-12-26 Thread AlexR

Hi,

What's the level of parallelism when aggregations are calculated? Is it thread 
per shard?
In this case I assume a node hosting one index should have have roughly one 
shard per server core?

Is it the same for searching or lucene supports parallel search on the same 
index (ES shard)

Thanks
Alex
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b44c62d4-519e-4e3b-9b5c-ea95854fdf2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Wrong results on Terms Aggregations

2014-12-26 Thread Anantha Govindarajan

Hi Adrien,

This bug caused due to our application threading issue. Sorry for the wrong 
question.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e9f56411-47f1-4191-9ea2-470623990d93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Term Length when doing aggregations

2014-12-22 Thread Adrien Grand

If you have this requirement, I suspect that the reason is that you are
indexing some free text and would like to exclude meaningless terms from
the analysis? If yes then the best way to solve this issue would be to use
a `length` token filter at indexing time in order to not index these terms
at all:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-length-tokenfilter.html

On Mon, Dec 22, 2014 at 5:45 PM, Bruno Kamiche  wrote:

> Is there any way to set the mininum term length for aggregations?
>
> I mean, I need aggregations of words with a minimum length of 3 characters
> for example, also is there any wait to have an skip word list?
>
> Bruno
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3a5b8d6d-b36c-4063-aa73-799407c7ae6a%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/3a5b8d6d-b36c-4063-aa73-799407c7ae6a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5RwkUzR_5rmnQ_bTS%2BxzWsapPdyYc0PXG0CipiasmePQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Term Length when doing aggregations

2014-12-22 Thread Bruno Kamiche

Is there any way to set the mininum term length for aggregations? 

I mean, I need aggregations of words with a minimum length of 3 characters 
for example, also is there any wait to have an skip word list?

Bruno

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3a5b8d6d-b36c-4063-aa73-799407c7ae6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Nested object aggregations in Kibana 4

2014-12-21 Thread Raz Lachyani

Anyone ?

On Monday, November 17, 2014 11:21:29 AM UTC+2, Raz Lachyani wrote:
>
> Hi Guy,
>
> I played a little bit with the Kibana 4 and this is an amazing tool !!! 
> I have two questions:
>
> 1. I couldn't create nested objects aggregation, is it supported ? and if 
> not in which Beta release will it work ?
> 2. Currently I can only visualize two aggregations (one of them is sub 
> aggregation) in the same chart. Will it be possible to visualize more than 
> one sub aggregation in the same chart ?
>
> Thanks,
> Great work guy. and also so fast ... when do you have the time to sleep ;-)
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef31b191-922d-466c-a261-a6220a367fb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aggregations and scripted expressions

2014-12-12 Thread msbreuer

I am building a date_histogram query using aggregations. Per bucket are two 
different aggregations. First aggregations summarizes the number of 
activities and second one builds the sum over sum(duration per activity).

My Query looks like this:

   "aggregations": {
"over-time": {
"date_histogram": {
"field": "timestamp",
    "interval": "1M"
},
"aggregations": {
"activities": {
"sum": {
"field": "activity-count"
}
},
"duration": {
"sum": {
"field": "activity-duration"
}
}
   } 
}
}

Question: Required is a field containing the result of "activity-duration / 
activity-count". 

I would think about a scripted expression, but how to use it here. With ES 
1.4 the scripted_metrics was introduced (as experimental), but I think that 
is not what I am looking for. 

Any idea?

regards,
markus

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/984b97b5-4cfd-4393-a0ad-bb6bc6dd96f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: is it possible to avoid / ignore ClassCastExceptions in aggregations?

2014-12-09 Thread Nuno Lopes

Good idea, that might work for the "inner" fields, however the "string" is 
always present so we'll still get the error.  I might play around with 
changing the mapping to try to adopt your solution if all else fails.. 

On Tuesday, December 9, 2014 6:11:02 PM UTC, David Pilato wrote:
>
> May be a filter agg using a exist filter could help?
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 9 déc. 2014 à 19:01, Nuno Lopes > 
> a écrit :
>
> Hi David, Thanks for the reply.  
>
> Actually a more adequate example is below.  What we're trying to do is 
> apply the same aggregation over the different types even though we already 
> know some aggregations will fail (only using string and integer for this 
> example).   However, is there a way to ignore the ones that fail and have 
> the results for the valid aggregations?  The behaviour I'm finding is that 
> as long as one aggregation is invalid, we just get the error back.  And 
> would such solution also work over nested aggregations?
>
> POST /test/_search
> {
>"aggregations": {
>   "a1": {
>  "max": {
> "field": "value"
>  }
>   },
>   "a2": {
>  "max": {
> "field": "value.integer"
>  }
>   }
>}
> }
>
>
> On Tuesday, December 9, 2014 5:53:19 PM UTC, David Pilato wrote:
>>
>> If you want to compute on numerical values, you need to use the right 
>> field name.
>> value is a String. So you can’t use it for max agg.
>>
>> You could try with "value.integer". I think it should work.
>>
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> <http://Elasticsearch.com>*
>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
>> <https://twitter.com/elasticsearchfr> | @scrutmydocs 
>> <https://twitter.com/scrutmydocs>
>>
>>
>>  
>> Le 9 déc. 2014 à 18:47, Nuno Lopes  a écrit :
>>
>> Hello all,
>>
>>
>> Consider the following mapping:
>>
>> PUT /test/value/_mapping
>> {
>>"properties": {
>>   "value": {
>>  "type": "string",
>>  "index": "analyzed",
>>  "index_analyzer": "standard",
>>  "search_analyzer": "standard",
>>  "fields": {
>> "integer": {
>>"type": "integer",
>>"ignore_malformed": true
>> },
>> "double": {
>>"type": "double",
>>"ignore_malformed": true
>> },
>> "date": {
>>"type": "date",
>>"ignore_malformed": true}
>>  }
>>   }
>>}
>> }
>>
>>
>>
>> with simply these documents:
>>
>> PUT /test/value/1
>>
>> {
>> "value": "v1"
>> }
>> PUT /test/value/2
>> {
>> "value": "2"
>> }
>>
>>
>>
>>
>> Writing this aggregation the whole response consists of a 
>> ClassCastException:
>>
>> POST /test/_search
>> {
>>"aggregations": {
>>   "a1": {
>>  "terms": {
>> "field": "value"
>>  }
>>   },
>>   "a2": {
>>  "max": {
>> "field": "value"
>>  }
>>   }
>>}
>> }
>>
>>
>>
>> When I'm writing this query I know this is the case but in my application 
>> I'm generating different kinds of aggregations (from which some will give 
>> ClassCastExceptions but I have no easy way of knowing which beforehand). 
>>  Is there a way to ignore an aggregation that is invalid and return the 
>> results of the ones which are valid? And similarly, would this work for 
>> nested aggregations? 
>>
>> Thank you, best regards,
>> --
>> Nuno Lopes 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and st

Re: is it possible to avoid / ignore ClassCastExceptions in aggregations?

2014-12-09 Thread David Pilato

May be a filter agg using a exist filter could help?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 9 déc. 2014 à 19:01, Nuno Lopes  a écrit :
> 
> Hi David, Thanks for the reply.  
> 
> Actually a more adequate example is below.  What we're trying to do is apply 
> the same aggregation over the different types even though we already know 
> some aggregations will fail (only using string and integer for this example). 
>   However, is there a way to ignore the ones that fail and have the results 
> for the valid aggregations?  The behaviour I'm finding is that as long as one 
> aggregation is invalid, we just get the error back.  And would such solution 
> also work over nested aggregations?
> 
> POST /test/_search
> {
>"aggregations": {
>   "a1": {
>  "max": {
> "field": "value"
>  }
>   },
>   "a2": {
>  "max": {
> "field": "value.integer"
>  }
>   }
>}
> }
> 
> 
>> On Tuesday, December 9, 2014 5:53:19 PM UTC, David Pilato wrote:
>> If you want to compute on numerical values, you need to use the right field 
>> name.
>> value is a String. So you can’t use it for max agg.
>> 
>> You could try with "value.integer". I think it should work.
>> 
>> 
>> -- 
>> David Pilato | Technical Advocate | Elasticsearch.com
>> @dadoonet | @elasticsearchfr | @scrutmydocs
>> 
>> 
>> 
>>> Le 9 déc. 2014 à 18:47, Nuno Lopes  a écrit :
>>> 
>>> Hello all,
>>> 
>>> 
>>> Consider the following mapping:
>>> 
>>> PUT /test/value/_mapping
>>> {
>>>"properties": {
>>>   "value": {
>>>  "type": "string",
>>>  "index": "analyzed",
>>>  "index_analyzer": "standard",
>>>  "search_analyzer": "standard",
>>>  "fields": {
>>> "integer": {
>>>"type": "integer",
>>>"ignore_malformed": true
>>>     },
>>> "double": {
>>>"type": "double",
>>>"ignore_malformed": true
>>> },
>>> "date": {
>>>"type": "date",
>>>"ignore_malformed": true}
>>>  }
>>>   }
>>>}
>>> }
>>> 
>>> 
>>> 
>>> with simply these documents:
>>> 
>>> PUT /test/value/1
>>> 
>>> {
>>> "value": "v1"
>>> }
>>> PUT /test/value/2
>>> {
>>> "value": "2"
>>> }
>>> 
>>> 
>>> 
>>> 
>>> Writing this aggregation the whole response consists of a 
>>> ClassCastException:
>>> 
>>> POST /test/_search
>>> {
>>>"aggregations": {
>>>   "a1": {
>>>  "terms": {
>>> "field": "value"
>>>  }
>>>   },
>>>   "a2": {
>>>  "max": {
>>> "field": "value"
>>>  }
>>>   }
>>>}
>>> }
>>> 
>>> 
>>> 
>>> When I'm writing this query I know this is the case but in my application 
>>> I'm generating different kinds of aggregations (from which some will give 
>>> ClassCastExceptions but I have no easy way of knowing which beforehand).  
>>> Is there a way to ignore an aggregation that is invalid and return the 
>>> results of the ones which are valid? And similarly, would this work for 
>>> nested aggregations? 
>>> 
>>> Thank you, best regards,
>>> --
>>> Nuno Lopes 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/09b76e9b-e956-4d61-9057-7aa4f655b4bc%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/290e9213-4df1-4383-a15f-4cb9b9c09a1b%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/A00603D0-C594-4790-A99E-33CD00F034E5%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: is it possible to avoid / ignore ClassCastExceptions in aggregations?

2014-12-09 Thread Nuno Lopes

Hi David, Thanks for the reply.  

Actually a more adequate example is below.  What we're trying to do is 
apply the same aggregation over the different types even though we already 
know some aggregations will fail (only using string and integer for this 
example).   However, is there a way to ignore the ones that fail and have 
the results for the valid aggregations?  The behaviour I'm finding is that 
as long as one aggregation is invalid, we just get the error back.  And 
would such solution also work over nested aggregations?

POST /test/_search
{
   "aggregations": {
  "a1": {
 "max": {
"field": "value"
 }
  },
  "a2": {
 "max": {
"field": "value.integer"
 }
  }
   }
}


On Tuesday, December 9, 2014 5:53:19 PM UTC, David Pilato wrote:
>
> If you want to compute on numerical values, you need to use the right 
> field name.
> value is a String. So you can’t use it for max agg.
>
> You could try with "value.integer". I think it should work.
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> <http://Elasticsearch.com>*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
> <https://twitter.com/elasticsearchfr> | @scrutmydocs 
> <https://twitter.com/scrutmydocs>
>
>
>  
> Le 9 déc. 2014 à 18:47, Nuno Lopes > 
> a écrit :
>
> Hello all,
>
>
> Consider the following mapping:
>
> PUT /test/value/_mapping
> {
>"properties": {
>   "value": {
>  "type": "string",
>  "index": "analyzed",
>  "index_analyzer": "standard",
>  "search_analyzer": "standard",
>  "fields": {
> "integer": {
>"type": "integer",
>"ignore_malformed": true
> },
> "double": {
>"type": "double",
>"ignore_malformed": true
> },
> "date": {
>"type": "date",
>"ignore_malformed": true}
>  }
>   }
>}
> }
>
>
>
> with simply these documents:
>
> PUT /test/value/1
>
> {
> "value": "v1"
> }
> PUT /test/value/2
> {
> "value": "2"
> }
>
>
>
>
> Writing this aggregation the whole response consists of a 
> ClassCastException:
>
> POST /test/_search
> {
>"aggregations": {
>   "a1": {
>  "terms": {
> "field": "value"
>  }
>   },
>   "a2": {
>  "max": {
> "field": "value"
>  }
>   }
>}
> }
>
>
>
> When I'm writing this query I know this is the case but in my application 
> I'm generating different kinds of aggregations (from which some will give 
> ClassCastExceptions but I have no easy way of knowing which beforehand). 
>  Is there a way to ignore an aggregation that is invalid and return the 
> results of the ones which are valid? And similarly, would this work for 
> nested aggregations? 
>
> Thank you, best regards,
> --
> Nuno Lopes 
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/09b76e9b-e956-4d61-9057-7aa4f655b4bc%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/09b76e9b-e956-4d61-9057-7aa4f655b4bc%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/290e9213-4df1-4383-a15f-4cb9b9c09a1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: is it possible to avoid / ignore ClassCastExceptions in aggregations?

2014-12-09 Thread David Pilato

If you want to compute on numerical values, you need to use the right field 
name.
value is a String. So you can’t use it for max agg.

You could try with "value.integer". I think it should work.


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
<https://twitter.com/elasticsearchfr> | @scrutmydocs 
<https://twitter.com/scrutmydocs>



> Le 9 déc. 2014 à 18:47, Nuno Lopes  a écrit :
> 
> Hello all,
> 
> 
> Consider the following mapping:
> 
> PUT /test/value/_mapping
> {
>"properties": {
>   "value": {
>  "type": "string",
>  "index": "analyzed",
>  "index_analyzer": "standard",
>  "search_analyzer": "standard",
>  "fields": {
> "integer": {
>"type": "integer",
>"ignore_malformed": true
> },
> "double": {
>"type": "double",
>"ignore_malformed": true
> },
> "date": {
>"type": "date",
>"ignore_malformed": true}
>  }
>   }
>}
> }
> 
> 
> 
> with simply these documents:
> 
> PUT /test/value/1
> 
> {
> "value": "v1"
> }
> PUT /test/value/2
> {
> "value": "2"
> }
> 
> 
> 
> 
> Writing this aggregation the whole response consists of a ClassCastException:
> 
> POST /test/_search
> {
>"aggregations": {
>   "a1": {
>  "terms": {
> "field": "value"
>  }
>   },
>   "a2": {
>  "max": {
> "field": "value"
>  }
>   }
>}
> }
> 
> 
> 
> When I'm writing this query I know this is the case but in my application I'm 
> generating different kinds of aggregations (from which some will give 
> ClassCastExceptions but I have no easy way of knowing which beforehand).  Is 
> there a way to ignore an aggregation that is invalid and return the results 
> of the ones which are valid? And similarly, would this work for nested 
> aggregations? 
> 
> Thank you, best regards,
> --
> Nuno Lopes 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> <mailto:elasticsearch+unsubscr...@googlegroups.com>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/09b76e9b-e956-4d61-9057-7aa4f655b4bc%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/09b76e9b-e956-4d61-9057-7aa4f655b4bc%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9B83B8EC-C9B2-42C8-9505-4A489333BD05%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

1 2 3 4 >

1 - 100 of 394 matches

Mail list logo