Re: Is java elasticsearch a joke?

2015-03-25 Thread Perryn Fowler
On Thu, Mar 26, 2015 at 6:47 AM, Sai Asuka  wrote:

> telnet localhost:9300


should be

telnet localhost 9300

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aCoJs%3DT2MxAP_MrrCHUCHEaAtP%3DOKHDx%3Dt5-AwUR%3Din8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: filtered has_child query?

2015-03-11 Thread Perryn Fowler
'apply the filter before the query' doesn't make any sense to me - what
would it filter? I suspect I'm not really understanding you, can you tell
me more? Why do you want to be able to do this? How would it help?


anyway, from what I thing I do understand there are several ways to get the
results you want. Which one you choose probably depends on how you want the
results scored, and possibly performance considerations.

here is one way to try. (If you want to filter both then you need to ...
apply the filter to both :))
something like:

{
   "query": {
  "filtered": {
 "query": {
"has_child": {
   "type": "Bar",
   "query": {
 "filtered": {
"query": {
   "term": {
   "bar": "xyz"
}
},
"filter": {
   "term": {
   "access": "yes"
   }
}
  }
   }
}
 },
 "filter": {
"term": {
   "access": "yes"
}
 }
  }
   }
}

On Wed, Mar 11, 2015 at 9:15 PM, asanderson 
wrote:

> Actually, I  do want only parent documents returned, but I want the filter
> to be applied to both parent and child documents. Is there a way to specify
> that the filter is to be applied before the query, so that this would be
> possible? If not, how would I rewrite the query to do this?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/05a86e8c-9ef2-4028-b937-e6370202e677%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aBXHyr67vCxHb6BYV42y6RS9QCzGQK4xKb6Qmh8jph8pA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: filtered has_child query?

2015-03-10 Thread Perryn Fowler
The query as written will return a result because you are querying for
*Parent* documents that 'have children' matching your has_child query. You
can tell because the type in the url will be 'Foo'.

Hence, the filter you have specified is not run against the children, but
against the *parents*. In your example the parent document does indeed have
a access:yes field and does not get filtered out.

It is probably possible to do what you want, but it depends on whether you
are trying to retrieve parent or child documents.

To get parent documents, just add your filter criteria to the has_child
query.

To get child documents, use 'Bar' in the url and take a look into the
has_parent filter/query



On Wed, Mar 11, 2015 at 10:23 AM, asanderson 
wrote:

> Is a filtered has_child query possible where the filter is applied to the
> child document before the query of the has_child?
>
> e.g. Given the example below...
>
> curl -X PUT "http://localhost:9200/foobar"; -d
> "{\"mappings\":{\"Foo\":{},\"Bar\":{\"_parent\":{\"type\":\"Foo\""
>
> curl -X PUT "http://localhost:9200/foobar/Foo/1";
> -d "{\"foo\":\"abc\",\"access\":\"yes\"}"
>
> curl -X PUT "http://localhost:9200/foobar/Bar/2?parent=1";
> -d "{\"bar\":\"xyz\",\"access\":\"no\"}"
>
> The following filtered query should not return a result, if the filter was
> being applied to the child document first.
>
> {
>"query": {
>   "filtered": {
>  "query": {
> "has_child": {
>"type": "Bar",
>"query": {
>   "term": {
>  "bar": "xyz"
>   }
>}
> }
>  },
>  "filter": {
> "term": {
>"access": "yes"
> }
>  }
>   }
>}
> }
>
> Please advise.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/575c3aa3-169d-4f00-b353-9e54291da432%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aCsz%2B%2B%3DduMCZcR--T9s2eczn0Yq1jCftma9ae0KdaR%3Ddw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why is has_parent so slow? and can anything be done?

2015-03-03 Thread Perryn Fowler
Hi Martin,

Thanks very much for your help.

The final product will be indexing new documents at the same time as
querying, but thus far for my performance trials I am performing
queries/aggs only. I assume therefore that enabling eager global ordinals
would not help with the performance issues I am seeing. (as an aside, if I
do enable global ordinals by updating the mapping, do I need to re-index
everything for it to take effect?)

I was using a has_parent filter, so it was my understanding that score_mode
was irrelevant? (I did manage to get slightly better performance using a
has_parent query wrapped in a constant_score query - I will try score_mode)

In general, I think my situation is the perfect use case for parent/child
(I have a lot of child documents with immutable data, and much fewer parent
documents with changeable data. I want to be able to aggregate across the
child documents using buckets derived from fields on the parents Eg: find
the average of 'reading' (child document) in each 'location' (parent
document)).

Quite often, the 'location' is recorded incorrectly and needs to be
updated, which makes de-normalisation infeasible since all the child
documents would need to be updated (and there are millions)

I am finding though, that any use of the parent/child relationship
(has_parent, has_child, children aggregation)  slows down results by an
order of magnitude over queries that only aggregate directly over the child
documents.

If this is to be expected, then I may have to resort to a client side join
approach coupled with 'filters' aggregations to provide bucketing. This
will be significantly more fiddly from a code perspective though, so I just
want to make sure I'm not missing something.

cheers
Perryn

On Wed, Mar 4, 2015 at 1:05 AM, Martijn v Groningen <
martijn.v.gronin...@gmail.com> wrote:

> Are you also adding/modifying documents while searching with has_parent or
> has_child query?
> In that it makes sense to enable global ordinals loading on the _parent
> field:
>
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html#_global_ordinals_and_latency
>
> There is work going to be done to improve the has_child / has_parent
> queries when these queries are part of a bigger query (for example a bool
> query): https://github.com/elasticsearch/elasticsearch/issues/8134
>
> Are you using score_mode? That makes things more expensive, so if you
> don't need you can turn it off.
>
> Scaling out by adding more nodes does help to improve the query time.
>
> The has_parent / has_child queries come at a performance penalty. If you
> design your documents you should consider if you de-normalize your data so
> that you don't need parent/child, which makes your searches fast. However
> this is sometimes expensive because documents tend to get large or the
> amount of document to be updated makes simple updates from the application
> expensive. In those case parent/child should be considered.
>
> On 3 March 2015 at 03:12, Perryn Fowler  wrote:
>
>> Further investigation shows that anything that makes use of _parent seems
>> to result in slow queries, be it has_parent, has_child or the 'children'
>> aggregation.
>>
>> I should mention that I am using 1.4.4 - is this to be expected even with
>> the performance improvements made in recent releases?
>>
>>
>> On Mon, Mar 2, 2015 at 12:23 PM, Perryn Fowler 
>> wrote:
>>
>>> Hello,
>>>
>>> I am writing an analytics application that makes heavy use of
>>> aggregations.
>>>
>>> My situation seems suited to parent/child. I have relatively few parents
>>> (hundreds) and a lot more children (tens of millions).
>>>
>>> The has_parent query or filter provide an elegant way to perform the
>>> sort of queries I want, but the problem is they are very slow  (several
>>> seconds) compared to those that don't use them (100s of milliseconds)
>>>
>>> If I generate the parent ids on the client side and then use them in a
>>> terms filer on the "_parent" fields, things seem to be significantly faster
>>> (although still not ideal)
>>>
>>> The documentation I have read indicates that has_parent can be expected
>>> to be slow, but most suggested mitigations seem to be about reducing memory
>>> usage rather than speeding up queries.
>>>
>>> I am loathe to give up on a functionally  elegant solution. Why is
>>> has_parent so slow? Is there anything I could try to speed has_parent up?
>>> Should scaling out to more nodes help in this situation?
>>>
>>> cheers
>>> Perryn
>

Re: Why is has_parent so slow? and can anything be done?

2015-03-02 Thread Perryn Fowler
Further investigation shows that anything that makes use of _parent seems
to result in slow queries, be it has_parent, has_child or the 'children'
aggregation.

I should mention that I am using 1.4.4 - is this to be expected even with
the performance improvements made in recent releases?


On Mon, Mar 2, 2015 at 12:23 PM, Perryn Fowler 
wrote:

> Hello,
>
> I am writing an analytics application that makes heavy use of aggregations.
>
> My situation seems suited to parent/child. I have relatively few parents
> (hundreds) and a lot more children (tens of millions).
>
> The has_parent query or filter provide an elegant way to perform the sort
> of queries I want, but the problem is they are very slow  (several seconds)
> compared to those that don't use them (100s of milliseconds)
>
> If I generate the parent ids on the client side and then use them in a
> terms filer on the "_parent" fields, things seem to be significantly faster
> (although still not ideal)
>
> The documentation I have read indicates that has_parent can be expected to
> be slow, but most suggested mitigations seem to be about reducing memory
> usage rather than speeding up queries.
>
> I am loathe to give up on a functionally  elegant solution. Why is
> has_parent so slow? Is there anything I could try to speed has_parent up?
> Should scaling out to more nodes help in this situation?
>
> cheers
> Perryn
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aCEX2J4i1-SaRd6he63PdHu08mrLR3FbwfyPFraOcrnzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Why is has_parent so slow? and can anything be done?

2015-03-01 Thread Perryn Fowler
Hello,

I am writing an analytics application that makes heavy use of aggregations.

My situation seems suited to parent/child. I have relatively few parents
(hundreds) and a lot more children (tens of millions).

The has_parent query or filter provide an elegant way to perform the sort
of queries I want, but the problem is they are very slow  (several seconds)
compared to those that don't use them (100s of milliseconds)

If I generate the parent ids on the client side and then use them in a
terms filer on the "_parent" fields, things seem to be significantly faster
(although still not ideal)

The documentation I have read indicates that has_parent can be expected to
be slow, but most suggested mitigations seem to be about reducing memory
usage rather than speeding up queries.

I am loathe to give up on a functionally  elegant solution. Why is
has_parent so slow? Is there anything I could try to speed has_parent up?
Should scaling out to more nodes help in this situation?

cheers
Perryn

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aDDzFnSkQKr2aNVgqpM4Eu5YZHcDST%3D24g0A6ngOqCXEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parent child documents query

2015-01-27 Thread Perryn Fowler
You should be able to query the child type with a has_parent query which
has a has_child query nested within it.

No idea how it would perform though.

On Sun, Jan 25, 2015 at 3:29 AM, bvnrwork  wrote:

> For example:
>
> Have three below documents , FakeDoc,Doc1&Doc2
>
> Now how to write a query that qualifies Doc1 and also gets the all
> documents which has same parentid as Doc1
>
> That is Doc1 and Doc2 in this case
>
>
> FakeDoc{
>
> F1
>
> }
>
> Doc1
>
> {
>
> _parent:F1
>
> }
>
>
>
> Doc2
>
> {
>
> _parent:F2
>
> }
>
> On Friday, 23 January 2015 15:22:49 UTC-5, bvnrwork wrote:
>>
>>
>> Is there a way we can get all child's and parents if parent /child
>> qualifies for a query
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/32a3ac9e-409b-46df-af50-4ed6d8048bbc%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aBkkAUe%2BMqHKhJccnVPwKLTZBqEVY%2B3sBo96YenO_CRnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Queries vs Filters

2015-01-13 Thread Perryn Fowler
Thanks vineeth.

I may look at some sort of 'snap to grid' functionality in my app to try
and get at least some re-use of date ranges.

cheers
Perryn

On Tue, Jan 13, 2015 at 4:41 PM, vineeth mohan 
wrote:

> Hi ,
>
> Yes , Elasticsearch is going to create filter cache per filter.
> But then if you want to over run this behavior , you can put _cache as
> false in your query as follows -
>
> "filter" : {
> "fquery" : {
> "query" : {
> "query_string" : {
> "query" : "this AND that OR thus"
> }
> },
> "_cache" : false
> }
>
>
> Filter cache disable -
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html#_caching_15
>
> Thanks
>Vineeth Mohan,
>  Elasticsearch consultant,
>  qbox.io ( Elasticsearch service provider <http://qbox.io/>)
>
>
>
> On Tue, Jan 13, 2015 at 10:23 AM, Perryn Fowler 
> wrote:
>
>> Hello,
>>
>> I am building an application that performs aggregations over time-series
>> data.
>>
>> The prevailing advice for my situation seems to be that I should use
>> filters rather than queries to provide scope for my aggregations. The
>> reasons being
>> 1) I have no need for scoring
>> 2) I will be able to take advantage of filter caching.
>>
>> However, a very common use case is for my users to scope aggregations to
>> a completely arbitrary time range. This means it is relatively unlikely to
>> receive many requests scoped to exactly the same time range.
>>
>> If I implement this using a range filter, does this mean for filter
>> caching? Is ElasticSearch going waste time and memory building a separate
>> filter cache for each individual range it sees? (and should I hence use a
>> query?) Or is it smarter than that?
>>
>> cheers
>> Perryn
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAFps6aCX1BDHDEhU8R7eLhOduerYgNFeAhmgH0kijrbTNZJbAA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAFps6aCX1BDHDEhU8R7eLhOduerYgNFeAhmgH0kijrbTNZJbAA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kH29t3X26MyuSeVvxVKJsiFdw2602kGpjRh3dXb78g0g%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kH29t3X26MyuSeVvxVKJsiFdw2602kGpjRh3dXb78g0g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aAoEs2D4%3DWrCKf5LqPD-vcgDVXYFR2BCw8ux7dr8D-kvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Queries vs Filters

2015-01-12 Thread Perryn Fowler
Hello,

I am building an application that performs aggregations over time-series
data.

The prevailing advice for my situation seems to be that I should use
filters rather than queries to provide scope for my aggregations. The
reasons being
1) I have no need for scoring
2) I will be able to take advantage of filter caching.

However, a very common use case is for my users to scope aggregations to a
completely arbitrary time range. This means it is relatively unlikely to
receive many requests scoped to exactly the same time range.

If I implement this using a range filter, does this mean for filter
caching? Is ElasticSearch going waste time and memory building a separate
filter cache for each individual range it sees? (and should I hence use a
query?) Or is it smarter than that?

cheers
Perryn

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aCX1BDHDEhU8R7eLhOduerYgNFeAhmgH0kijrbTNZJbAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Aggregating using 'Dynamic Fields'

2014-12-11 Thread Perryn Fowler
Hello

I have a use case that feels like a good fit for ElasticSearch except for 
one problem. I'm hoping someone might be able to suggest an approach for 
overcoming it using ElasticSearch.

I have a lot of time-series data from sensors. Extremely simplified, a 
reading looks a bit like this

{ "sensor_id": 12345678, "timestamp": 10203454354, "value": 5643 }

I want to do things like calculate the average value for each sensor within 
date buckets for recent history. 

Thus far ElasticSearch seems like an excellent fit (using  an approach 
similar to that described here: 
http://www.elasticsearch.com/guide/en/elasticsearch/guide/current/time-based.html)

The problem is that I need the end user to be able to dynamically group 
sensors into 'categories' via a UI and then do aggregations and filtering 
based on that.
( eg 1: calculate the average value for each category of sensor within date 
buckets for recent history)
( eg 2: as above but filtered to only calculate for category A & B)

If the user moves a particular sensor from one category to another, then 
the system should reflect that when calculating aggregations across 
previous readings.

Some approaches I could take

1) re-index every time a user changes the category structure. This doesn't 
really seem feasible.

2) Resolve categories to sensor_ids in the application and use them to 
filter and bucket in ElasticSearch. Take the result from ElasticSearch and 
re-aggregate in the application.
This seems problematic because
   A) There may be 1000s of sensor_ids in a category. The request 
payload could get quite large.
   B) It seems a shame to have to implement bucketing and 
aggregation in the app when I have ElasticSearch

3) Filter and Aggregate using a function that can map a sensor_id to a 
category for each reading.
This would address problem B from approach 2, but 
a) the function would still be large if there are 1000s of 
sensor ids, and
b)  I am unsure of the performance implications of using 
functions this way.

Has anyone done something like this with ElasticSearch? How?

Cheers
Perryn

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1a2acbc4-e72e-488a-8ef3-36846d290b4c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.