from:"Jeff Steinmetz"

Re: spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Jeff Steinmetz

Thank you for the summary - you are confirming (as a sanity check for 
myself): 

elasticsearch-hadoop beta3 (not snapshot) on spark core 1.1 only
elasticsearch-hadoop-beta3-SNAPSHOT with spark core 1.1, 1.2 and 1.3 -- as 
long as I don't use Spark SQL when using 1.2 and 1.3

Costin - I am amazed by your ability to keep all this straight - my head 
would explode dealing with all the dependencies in flux.  Kudos to you.


On Tuesday, March 17, 2015 at 2:12:06 PM UTC-7, Costin Leau wrote:
>
> es-hadoop doesn't depend on akka, only on Spark. The scala version that 
> es-hadoop is compiled against matches the one used by the Spark version 
> compiled against for each release - typically this shouldn't pose a problem.
>
> Unfortunately, despite the minor version increments, some of the Spark 
> APIs or components (in particular Spark SQL) have changed drastically 
> between each release breaking backwards compatibility. For example, Beta3 
> works until Spark 1.1 (which was the latest stable release during its 
> release) but not with 1.2. This is fixed in master however the current dev 
> build doesn't work with Spark SQL in the newly released 1.3 (does work with 
> Spark core).
>
> This has already been fixed locally however I'm having difficulties trying 
> to preserve compatibility across the Spark SQL 1.2 release and 1.3.
>
> Long story short, as long as the dependencies for Spark are in order, the 
> same should apply for es-hadoop as well since it relies only on Spark (and 
> Scala of course).
>
> On Tue, Mar 17, 2015 at 10:43 PM, Jeff Steinmetz  > wrote:
>
>> There are plenty of spark / akka / scala / elasticsearch-hadoop 
>> dependencies to keep track of.
>>
>> Is it true that elasticsearch-hadoop needs to be compiled for a specific 
>> spark version to run correctly on the cluster?  I'm also trying to keep 
>> track of the akka version and scala version.  i.e, wil es-hadoop compiled 
>> for spark 1.2  work with Spark 1.3 ?
>>
>> When the elasticsearch-hadoop versions are released, as v2.0 v2.1, 
>> v2.1.0.Beta3, at what point do we need to keep in mind what spark version 
>> it was also compiled against?
>> i.e., is it safe to assume the es-hadoop versions are tied to a specific 
>> spark core version?
>>
>> I've been keeping the following chart in my notes to see what all the 
>> versions are with all dependencies
>> =
>>
>> Akka Version  Dependencies
>> Current Akka Stable Release:  2.3.9
>>
>> Elasticsearch-Hadoop:  2.1.0Beta3 = Spark 1.1.0
>> Elasticsearch-Hadoop:  2.1.0Beta3-SNAPSHOT = Spark 1.2.1
>> Elasticsearch-Hadoop: what about spark 1.3 ?
>>
>> Spark: 1.3, Akka: 2.3.4-spark
>> Spark: 1.2, Akka: 2.3.4-spark
>> Spark: 1.1, Akka: 2.2.3-shaded-protobuf
>>
>> Activator 1.2.12 comes with with Akka 2.3.4
>>
>> Play 2.3.8, akka 2.3.4, scala 2.11.1 (will also work with 2.10.4 )
>> Play 2.2.x, akka 2.2.0
>>
>> Spark Job Server 0.4.1, Spark Core 1.1.0, Akka, 2.2.4
>> Spark Job Server Master as of Feb 22, 2015, Spark Core 1.2.0,  Akka 
>> 2.3.4, Scala 2.10.4
>>
>> Akka persistence latest 2.3.4 or later
>> Akka 2.3.9 is released for Scala 2.10.4 and 2.11.5
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/28ad3f78-8b3d-450a-a29d-06d3e6636cfd%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/28ad3f78-8b3d-450a-a29d-06d3e6636cfd%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c19d1fa-17b9-4e6e-a698-b49c7d6919d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Jeff Steinmetz

There are plenty of spark / akka / scala / elasticsearch-hadoop 
dependencies to keep track of.

Is it true that elasticsearch-hadoop needs to be compiled for a specific 
spark version to run correctly on the cluster?  I'm also trying to keep 
track of the akka version and scala version.  i.e, wil es-hadoop compiled 
for spark 1.2  work with Spark 1.3 ?

When the elasticsearch-hadoop versions are released, as v2.0 v2.1, 
v2.1.0.Beta3, at what point do we need to keep in mind what spark version 
it was also compiled against?
i.e., is it safe to assume the es-hadoop versions are tied to a specific 
spark core version?

I've been keeping the following chart in my notes to see what all the 
versions are with all dependencies
=

Akka Version  Dependencies
Current Akka Stable Release:  2.3.9

Elasticsearch-Hadoop:  2.1.0Beta3 = Spark 1.1.0
Elasticsearch-Hadoop:  2.1.0Beta3-SNAPSHOT = Spark 1.2.1
Elasticsearch-Hadoop: what about spark 1.3 ?

Spark: 1.3, Akka: 2.3.4-spark
Spark: 1.2, Akka: 2.3.4-spark
Spark: 1.1, Akka: 2.2.3-shaded-protobuf

Activator 1.2.12 comes with with Akka 2.3.4

Play 2.3.8, akka 2.3.4, scala 2.11.1 (will also work with 2.10.4 )
Play 2.2.x, akka 2.2.0

Spark Job Server 0.4.1, Spark Core 1.1.0, Akka, 2.2.4
Spark Job Server Master as of Feb 22, 2015, Spark Core 1.2.0,  Akka 2.3.4, 
Scala 2.10.4

Akka persistence latest 2.3.4 or later
Akka 2.3.9 is released for Scala 2.10.4 and 2.11.5


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28ad3f78-8b3d-450a-a29d-06d3e6636cfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: function_score weight of 0.0 returns 1

2015-02-23 Thread Jeff Steinmetz

BTW, I am running against Elasticsearch version
1.4.4


On Monday, February 23, 2015 at 6:49:59 PM UTC-8, Jeff Steinmetz wrote:
>
> Any ideas why this would return a score of 1, when the last (bool 
> must_not) matches, even though the weight is set to 0.0?
>
> As a bit of background, this provides custom scoring where 
> - if one field matches, score at .35, 
> - if another more important field matches, score .65, 
> - if both match, score 1.0 (the sume)
> - if non match, score 0
>
> If I change the no match weight to -1 or something very small, like 
> 0.01, the weight for the must_not match comes back accurately.
> I am thinking this is some kind of divide by 0 error handerl in ES, that 
> returns 1 when a score hits 0?
>
> GET test/document/_search
> {
>   "query": {
> "function_score": {
>   "filter": { "term": { "type": "animal" } },
>   "functions": [ 
> {
>   "filter": { 
> "bool" : {
> "must" : {
> "term" : { "content" : "fish"}
> }}}, 
>   "weight": 0.35
> },
> {
>   "filter": { 
> "bool" : {
> "must" : {
> "term" : { "user_description" : "fish"  }
> }}}, 
>   "weight": 0.65
> },
> {
>   "filter": { 
> "bool" : {
> "must_not" : [
> {"term" : { "user_description" : "fish" }},
> {"term" : { "content" : "fish" }}
> ]
> }
>   }, 
>   "weight": 0.0
> }
>   ],
>   "score_mode": "sum"
> }
>   }
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7edb1ff9-a2f5-4770-9d83-d1afdd5af6ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

function_score weight of 0.0 returns 1

2015-02-23 Thread Jeff Steinmetz

Any ideas why this would return a score of 1, when the last (bool must_not) 
matches, even though the weight is set to 0.0?

As a bit of background, this provides custom scoring where 
- if one field matches, score at .35, 
- if another more important field matches, score .65, 
- if both match, score 1.0 (the sume)
- if non match, score 0

If I change the no match weight to -1 or something very small, like 
0.01, the weight for the must_not match comes back accurately.
I am thinking this is some kind of divide by 0 error handerl in ES, that 
returns 1 when a score hits 0?

GET test/document/_search
{
  "query": {
"function_score": {
  "filter": { "term": { "type": "animal" } },
  "functions": [ 
{
  "filter": { 
"bool" : {
"must" : {
"term" : { "content" : "fish"}
}}}, 
  "weight": 0.35
},
{
  "filter": { 
"bool" : {
"must" : {
"term" : { "user_description" : "fish"  }
}}}, 
  "weight": 0.65
},
{
  "filter": { 
"bool" : {
"must_not" : [
{"term" : { "user_description" : "fish" }},
{"term" : { "content" : "fish" }}
]
}
  }, 
  "weight": 0.0
}
  ],
  "score_mode": "sum"
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/de18cc05-c797-45d5-aa4d-b5426e9562d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Access nested keys in Groovy

2015-01-09 Thread Jeff Steinmetz

Is there a way to easily check for the existence of a key and subsequently 
access keys in complex object (nested, maybe at times 3 levels deep 
unfortunately)
simplified example doc:
{
 "level1" : {
  "level2" : {
"links" : ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
"http://bit.ly/ghi";]
   }
 },
 "title" : "test123"
}

This doesn't seem to work (form what I can tell it never finds the key 
level1.level2.links even when it does exist)

  "script" : "if (ctx._source.containsKey('level1.level2.links') ) 
{ctx._source.links_url_count = ctx._source['level1.level2.links''].size() } 
else { ctx._source.links_url_count = 0 }"

Simple keys work though like ctx._source.containsKey('title') 

Does it require intermediate checks for every level (which could be a messy 
script)?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3d0e9aa-ac68-4775-85d2-06a0bf3f3bd4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz

Now that I am into the real wold scenario, it gets a bit tricker - I have 
nested objects (keys).
I have to test the existence of the key in the Groovy script to avoid 
parsing errors on insert.

How do you access a nested object in groovy?  and test for the existence of 
a nested object key?
such as this example:

curl -XPOST 'http://'$NODE':9200/'$INDEX_NAME'/post' -d '{
  "titles": ["title 1", "title 2", "title 3", "title 4"],
  "raw" : {
"links" : ["http://bit.ly/abc";, "http://bit.ly/abc";, 
"http://bit.ly/def";, "http://bit.ly/ghi";]
  }
}'

This doesn't seem to work (form what I can tell it never finds the key 
raw.links even when it does exist)

  "script" : "if (ctx._source.containsKey('raw.links') ) 
{ctx._source.links_url_count = ctx._source['raw.links''].size() } else { 
ctx._source.links_url_count = 0 }"

Simple keys work though like ctx._source.containsKey('title') 

On Thursday, January 8, 2015 at 9:59:56 PM UTC-8, Nikolas Everett wrote:
>
> Transform never saves to source. You have to transform on the application 
> side for that. It was designed for times when you wanted to index something 
> like this that would just take up extra space in the source document. I 
> imagine you could use a script field on the query if you need the result to 
> contain the count. Or just count it on the result side. 
>
> Nik
> On Jan 9, 2015 12:43 AM, "Jeff Steinmetz"  > wrote:
>
>> Transform worked well.  Nice.
>>
>> Curious how to get it to save to source?  Tried this below, no go.  (I 
>> can however do range queries agains title_count, so transform was indexed 
>> and works well)
>>
>> "transform" : {
>>   "script" : "ctx._source['\'title_count\''] = 
>> ctx._source['\'titles\''].size()",
>>   "lang": "groovy"
>> },
>>  "properties": {
>>  "titles": { "type": "string", "index": "not_analyzed" },
>>  "title_count" : { "type": "integer", "store": "yes" }
>>}
>> }'
>>
>>
>> On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>>>
>>> Source is going to be pretty sloe, yeah. If its a one off then its 
>>> probably fine but if you do it a lot probably best to index the count. 
>>> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz"  wrote:
>>>
>>>> Thank you, that worked.
>>>>
>>>> I was curious about the speed, is running a script using _source slower 
>>>> that doc[] ?
>>>>
>>>> Totally understand a dynamic script is slower regardless of _source vs 
>>>> doc[].
>>>>
>>>> Makes sense that having a count transformed up front during index to 
>>>> create a materialized value would certainly be much faster.
>>>>
>>>>
>>>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz  
>>>>> wrote:
>>>>>
>>>>> Is there a better way to do this?
>>>>>>
>>>>>> Please see this gist (or even better yet, run the script locally see 
>>>>>> the issue).
>>>>>>
>>>>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>>>>
>>>>>> You must have scripting enabled in your elasticsearch config for this 
>>>>>> to work.
>>>>>>
>>>>>> This was originally based on some comments I found here:
>>>>>> http://stackoverflow.com/questions/17314123/search-by-size-
>>>>>> of-object-type-field-elastic-search
>>>>>>
>>>>>> We would like to use a filtered query to only include documents that 
>>>>>> a small count of items in the list [aka array], filtering where 
>>>>>>  values.size() < 10
>>>>>>
>>>>>> "script": "doc['titles'].values.size() < 10"
>>>>>>
>>>>>> Turns out the values.size() actually either counts tokenized 
>>>>>> (analyzed) words, or if the mapping turns off analysis, it still counts 
>>>>&g

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz

Transform worked well.  Nice.

Curious how to get it to save to source?  Tried this below, no go.  (I can 
however do range queries agains title_count, so transform was indexed and 
works well)

"transform" : {
  "script" : "ctx._source['\'title_count\''] = 
ctx._source['\'titles\''].size()",
  "lang": "groovy"
},
 "properties": {
 "titles": { "type": "string", "index": "not_analyzed" },
 "title_count" : { "type": "integer", "store": "yes" }
   }
}'


On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>
> Source is going to be pretty sloe, yeah. If its a one off then its 
> probably fine but if you do it a lot probably best to index the count. 
> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz"  > wrote:
>
>> Thank you, that worked.
>>
>> I was curious about the speed, is running a script using _source slower 
>> that doc[] ?
>>
>> Totally understand a dynamic script is slower regardless of _source vs 
>> doc[].
>>
>> Makes sense that having a count transformed up front during index to 
>> create a materialized value would certainly be much faster.
>>
>>
>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>>
>>>
>>>
>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz  
>>> wrote:
>>>
>>> Is there a better way to do this?
>>>>
>>>> Please see this gist (or even better yet, run the script locally see 
>>>> the issue).
>>>>
>>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>>
>>>> You must have scripting enabled in your elasticsearch config for this 
>>>> to work.
>>>>
>>>> This was originally based on some comments I found here:
>>>> http://stackoverflow.com/questions/17314123/search-by-
>>>> size-of-object-type-field-elastic-search
>>>>
>>>> We would like to use a filtered query to only include documents that a 
>>>> small count of items in the list [aka array], filtering where 
>>>>  values.size() < 10
>>>>
>>>> "script": "doc['titles'].values.size() < 10"
>>>>
>>>> Turns out the values.size() actually either counts tokenized (analyzed) 
>>>> words, or if the mapping turns off analysis, it still counts incorrectly 
>>>> if 
>>>> there are duplicates.
>>>> If analyze is not turned off, it counts tokenized words, not the number 
>>>> of elements in the list.
>>>> If analyze is turned off for a given field, it improves, but duplicates 
>>>> are missed.
>>>>
>>>> For example, This comes back as size == 2
>>>> "titles": ["one", "duplicate", "duplicate"]
>>>> This comes back as size == 3, should be 4
>>>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
>>>> "http://bit.ly/ghi";]
>>>>
>>>> Is this a bug, is there a better way, or is this just something that we 
>>>> don't understand about groovy and values.size()?
>>>>
>>>>
>>>>
>>> I think that's just the way doc[] works.  Try (but don't actually 
>>> deploy) _source['titles'].size() < 10.  That should do what you expect.  
>>> Don't deploy that because its too slow.  Try indexing the size and 
>>> filtering on it.  You can use a transform to add the size of the array as 
>>> an integer field and just filter on it using a range filter.  That'd 
>>> probably be the fastest option.
>>>
>>> Nik
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz

Thank you, that worked.

I was curious about the speed, is running a script using _source slower 
that doc[] ?

Totally understand a dynamic script is slower regardless of _source vs 
doc[].

Makes sense that having a count transformed up front during index to create 
a materialized value would certainly be much faster.


On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>
>
>
> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz  > wrote:
>
> Is there a better way to do this?
>>
>> Please see this gist (or even better yet, run the script locally see the 
>> issue).
>>
>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>
>> You must have scripting enabled in your elasticsearch config for this to 
>> work.
>>
>> This was originally based on some comments I found here:
>>
>> http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search
>>
>> We would like to use a filtered query to only include documents that a 
>> small count of items in the list [aka array], filtering where 
>>  values.size() < 10
>>
>> "script": "doc['titles'].values.size() < 10"
>>
>> Turns out the values.size() actually either counts tokenized (analyzed) 
>> words, or if the mapping turns off analysis, it still counts incorrectly if 
>> there are duplicates.
>> If analyze is not turned off, it counts tokenized words, not the number 
>> of elements in the list.
>> If analyze is turned off for a given field, it improves, but duplicates 
>> are missed.
>>
>> For example, This comes back as size == 2
>> "titles": ["one", "duplicate", "duplicate"]
>> This comes back as size == 3, should be 4
>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
>> "http://bit.ly/ghi";]
>>
>> Is this a bug, is there a better way, or is this just something that we 
>> don't understand about groovy and values.size()?
>>
>>
>>
> I think that's just the way doc[] works.  Try (but don't actually deploy) 
> _source['titles'].size() < 10.  That should do what you expect.  Don't 
> deploy that because its too slow.  Try indexing the size and filtering on 
> it.  You can use a transform to add the size of the array as an integer 
> field and just filter on it using a range filter.  That'd probably be the 
> fastest option.
>
> Nik
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Jeff Steinmetz

Is there a better way to do this?

Please see this gist (or even better yet, run the script locally see the 
issue).

https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae

You must have scripting enabled in your elasticsearch config for this to 
work.

This was originally based on some comments I found here:
http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search

We would like to use a filtered query to only include documents that a 
small count of items in the list [aka array], filtering where 
 values.size() < 10

"script": "doc['titles'].values.size() < 10"

Turns out the values.size() actually either counts tokenized (analyzed) 
words, or if the mapping turns off analysis, it still counts incorrectly if 
there are duplicates.
If analyze is not turned off, it counts tokenized words, not the number of 
elements in the list.
If analyze is turned off for a given field, it improves, but duplicates are 
missed.

For example, This comes back as size == 2
"titles": ["one", "duplicate", "duplicate"]
This comes back as size == 3, should be 4
"titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
"http://bit.ly/ghi";]

Is this a bug, is there a better way, or is this just something that we 
don't understand about groovy and values.size()?










-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5e88338-8c4f-4cb8-b6c4-d7f47b365175%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch spark esRDD not returing the aggregate values in aggregated query

2014-10-17 Thread Jeff Steinmetz

Siva,

Try the latest build of elasticsearch-hadoop,  ver 2.1.0 Beta 2
http://www.elasticsearch.org/overview/hadoop/download/

The esRDD has been changed to sparks PairRDD
https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

The RDD will now be key/value (tuples) that look like (String, Map[String, 
ANY])

so you could start to walk the json key/value hierarchy with something like:

esRDD.flatMap { args => args._2.get("aggregations") }

(the syntax above is not exact, since your specific query result may have a 
different first key/value pair as the first object )

Best,
Jeff Steinmetz
Director of Data Science
Ekho, Inc.
www.ekho.me
@jeffsteinmetz


On Wednesday, September 17, 2014 6:13:37 AM UTC-7, siva pradeep wrote:
>
> Hi,
>
> I have a query which filters the rows and then applies the aggregation. I 
> tried running the query in "Sense" it gave me the expected result. But when 
> I try to run the same query using elasticsearch-spark_2.10 I get the rows 
> filtered by the query but not the aggregation result. I am sure I am 
> missing some thing but unable to figure out that.
>
>  Here is the query
>
> GET _search
> {
>   "query" : {
> "bool": {
>   "must": [
> {
>   "filtered": {
> "query": {
>   "range": {
> "@timestamp": {
>   "from": "2014-09-03T01:40:37.437Z",
>   "to": "2014-09-03T01:45:11.437Z"
> }
>   }
> }
>   }
> }
>   ]
> }
>   },
>   
>   "size": 0,
>   
>   "fields": ["cid","entity"],
>   "aggs": {
> "cid": {
>   "terms": {
> "field": "cid",
> "min_doc_count": 2,
> "size": 100
>   },
>   
>   "aggs": {
> "tn": {
>   "terms": {
> "field": "entity"
>   }
> }
>   }
> }
>   }
> }
>
>
> Query Result:
>
> {
>"took": 10005,
>"timed_out": false,
>"_shards": {
>   "total": 10,
>   "successful": 10,
>   "failed": 0
>},
>"hits": {
>   "total": 2430,
>   "max_score": 0,
>   "hits": []
>},
>"aggregations": {
>   "cid": {
>  "buckets": [
> {
>"key": " 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168 
> ",
>"doc_count": 2,
>"tn": {
>   "buckets": [
>  {
> "key": "15052563268",
> "doc_count": 2
>  }
>   ]
>}
> }
>  ]
>   }
>}
> }
>
>
> Spark program :
>
> object PresenceFilter extends App {
>
>   val query: String = "{\n\n  \"query\" : {\n\n\"bool\": {\n\n  
> \"must\": [\n\n{\n\n  \"filtered\": {\n\n
> \"query\": {\n\n  \"range\": {\n\n
> \"@timestamp\": {\n\n  \"from\": 
> \"2014-09-03T01:40:37.437Z\",\n\n  \"to\": 
> \"2014-09-03T01:45:11.437Z\"\n\n}\n\n  
> }\n\n}\n\n  }\n\n}\n\n  ]\n\n}\n\n  
> },\n  \n  \"size\": 0,\n  \n  \"fields\": [\"cid\",\"entity\"],\n\n  
> \"aggs\": {\n\n\"cid\": {\n\n  \"terms\": {\n\n\"field\": 
> \"cid\",\n\n\"min_doc_count\": 2,\n\n\"size\": 100\n\n  
> },\n  \n  \"aggs\": {\n\n\"tn\": {\n\n  \"terms\": 
> {\n\n\"field\": \"entity\"\n\n  }\n\n}\n\n  
> }\n\n}\n  }\n\n}"
>
>   val sparkConf = new SparkConf()
> .setAppName("PresenceAnalysis")
> .setMaster("local[4]")
> .set("es.nodes", "prs-wch-10.sys.comcast.net")
> .set("es.port", "9200")
> .set

Re: org.elasticsearch.search.aggregations docs

2014-08-11 Thread Jeff Steinmetz

Thank you.  
Although I was specifically talking about documentation for the Java search 
API.

For example, there is this
http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/java-facets.html

But ... haven't found anything that covers the Aggregations replacement.


On Monday, August 11, 2014 4:08:41 PM UTC-7, Isabel Drost-Fromm wrote:
>
> On Mon, Aug 11, 2014 at 10:47 PM, Jeff Steinmetz  > wrote:
>
>> I've been looking all over the place for documentation on
>>
>> org.elasticsearch.search.aggregations._
>>
>
> There's quite a bit of information in the online docs:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html
>
> Hope this helps you.
>
>  
> Isabel
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6551218f-ce51-4fa5-9334-a7952327929c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Creating filters per aggregation similar to Facets

2014-08-11 Thread Jeff Steinmetz

Kibana provides a good example of date histograms, split out by each 
"query" entered at the top in the "Query" bar.  It essentially creates 
multiple free text queries against "all".

I see it generates per facet filter, with a free text (query_string) 
search.  
Since facets are to be depreciated, I am now only using aggregations (in a 
custom application - unrelated to Kibana).  I have tried this with 
aggregations without success.
I also realize there is something new coming in 1.4, but I assume with 
multiple aggregations, (vs. multiple filters to create multiple buckets) I 
can do this today.

Here is a oversimplified version of the date histogram aggregation I have 
(without the leading query section - consider it pseudo code)

The "filter" section is the part in question.  Removing the filter works, I 
have tried all types of "filter" formats, looked for samples, etc. no luck. 
 I have tried {"all" : "search term"}
as well as: 
{"query_string": { "all" : "search terrm" }}

I've tried a specific field name, etc.  All attempts are not proving 
fruitful.

Pseudo example using aggregations:


  "aggregations" : {
"0" : {
  "date_histogram" : {
  "filter" : { "query_string" : { "query" : "Intel" } },
"field" : "created_at",
"interval" : "1d",
"min_doc_count" : 0
  }
},
   "1" : {
  "date_histogram" : {
  "filter" : { "query_string" : { "query" : "Samsung" } },
"field" : "created_at",
"interval" : "1d",
"min_doc_count" : 0,
"pre_zone" : "-02:00",
"post_zone" : "-03:30"
  }
}
  }


Here is the Facet version (which works - note 
Filtered/query/query_string/query):
---

{
"facets": {
"0": {
"date_histogram": {
"field": "created_at",
"interval": "3h"
},
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "Intel"
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"userid": [

"53d02d6aed9597f3c6fa"
]
}
},
{
"range": {
"created_at": {
"from": "now-30d",
"to": "now"
}
}
}
]
}
}
}
}
}
}
},
"1": {
"date_histogram": {
"field": "created_at",
"interval": "3h"
},
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "Samsung"
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"userid": [

"53d02d6aed9597f3c6fa"
]
}
},
{
"range": {
"created_at": {
"from": "now-30d",
"to": "now"
}
}
}
]
}
}
}
}
}
}

org.elasticsearch.search.aggregations docs

2014-08-11 Thread Jeff Steinmetz

I've been looking all over the place for documentation on

org.elasticsearch.search.aggregations._

The Java API 1.x docs state Facets will be depreciated.
I am using the source code on github for reference.

I also see that Kibana does all its queries using Facets.  

I wanted to make sure I wasn't missing something.  Searched everywhere.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4023b71f-1b17-4aec-ad3c-b042d2b93dbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: spark version, elasticsearch-hadoop version, akka version sync up

spark version, elasticsearch-hadoop version, akka version sync up

Re: function_score weight of 0.0 returns 1

function_score weight of 0.0 returns 1

Access nested keys in Groovy

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

counting items in a list [array] returns (what we think) are incorrect counts via groovy

Re: ElasticSearch spark esRDD not returing the aggregate values in aggregated query

Re: org.elasticsearch.search.aggregations docs

Creating filters per aggregation similar to Facets

org.elasticsearch.search.aggregations docs

13 matches

Site Navigation

Mail list logo

Footer information