Re: Questions about dedicated master client node

2015-05-30 Thread James Macdonald
If it is good enough for you, it is good enough for you. I will just give
you one anecdote: We implemented 3 dedicated clients on a 9 data node
cluster and got a 2x performance improvement. Moving the query
coordination, network io (has to receive data from every shard), and
combination of results (aggs and sorts) off of the nodes providing the
results is very helpful.

James



On Sat, May 30, 2015 at 9:11 AM, Xudong You xudong@gmail.com wrote:

 Thanks Nikolas,
 How do you think about dedicated client node (the so called load balance
 node)? Any benefit of dedicated client node? Seems to me, round robin to
 data nodes is good enough.

 On Friday, May 29, 2015 at 10:55:01 PM UTC+8, Nikolas Everett wrote:

 Dedicated master nodes are super convenient if you have the it
 infrastructure to host them on shared machines because they are very low
 load and its useful to be able to restart the master nodes quickly. We
 don't have that kind of infrastructure and our cluster is pretty large and
 not having it has bitten us once or twice but its not a huge problem.



 On Fri, May 29, 2015 at 10:44 AM, Xudong You xudon...@gmail.com wrote:

 Right now we only need 4 ES nodes due to the small data volume, and all
 4 nodes are master  data nodes.

 Q1:
 I am wondering in this case, is it necessary to have dedicated master
 and client node? Any benefit of having dedicated master node?

 Some one said that dedicated master nodes (say, three master nodes) is
 helpful to avoid the split brain issue, but even we have NO dedicated
 master nodes, we can also avoid the split brain by setting the 
 *discovery.zen.minimum_master_nodes
 *to a appropriate value.

 Q2:
 Similarly, is dedicated client node really necessary in our 4 nodes
 case? Any benefit of allocating dedicated client node?

 Thanks!

 --
 Please update your bookmarks! We have moved to
 https://discuss.elastic.co/
 ---
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/de7db788-a6d2-48c2-934b-bc5f7ae311a9%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/de7db788-a6d2-48c2-934b-bc5f7ae311a9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 Please update your bookmarks! We have moved to https://discuss.elastic.co/
 ---
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d61afae2-2e47-4b65-866b-5a55d28b84ea%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d61afae2-2e47-4b65-866b-5a55d28b84ea%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAABsnTbQispJSH%3D7_wbk-W5%2BmMq1_4Yy2Mxeh8RL%2BAAYeaRx6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation profiling?

2015-05-28 Thread James Macdonald
I don't have an answer, but I really like this question. I too would love
to see more query and aggregation profiling tools for performance
optimization purposes.

Also, I assume you have already looked at this, but have you made sure you
are not evicting anything from your in memory field data?

James

On Mon, May 25, 2015 at 4:08 PM, Mike Sukmanowsky 
mike.sukmanow...@gmail.com wrote:

 I don't believe there are any current endpoints in the API that support
 this, but are there plans to add better profiling information to ES
 aggregation queries? We'll see some agg queries return in 11s, then 5s
 then 11s again. Sometimes we can see associated filter cache expirations,
 but it's really hard to line these up to one specific query in our
 production environment since multiple users are executing queries
 simultaneously.

 It'd be really helpful to optionally see where aggregation queries are
 spending the bulk of their time to help us understand what to improve in
 the future.

 Anything we can do here right now?

 --
 Mike Sukmanowsky
 Aspiring Digital Carpenter

 *e*: mike.sukmanow...@gmail.com

 facebook http://facebook.com/mike.sukmanowsky | twitter
 http://twitter.com/msukmanowsky | LinkedIn
 http://www.linkedin.com/profile/view?id=10897143 | github
 https://github.com/msukmanowsky

   --
 Please update your bookmarks! We have moved to https://discuss.elastic.co/
 ---
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAOH6cu5WSGqQ%2BZ0_qrofXEvwo8JuSH9xoSbZgSwiT90MJ_wxdA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAOH6cu5WSGqQ%2BZ0_qrofXEvwo8JuSH9xoSbZgSwiT90MJ_wxdA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAABsnTZOmx-fk%2BG9dR6-XYB_1j7mGRNRwTqvQRwKx0YAcopFWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard query cache size

2015-05-23 Thread James Macdonald
Hi Adrien, thanks very much for this clarification.  I am always trying to
learn more about how Elasticsearch works, and that clarification was very
helpful.

James

On Thu, May 21, 2015 at 6:12 PM, Adrien Grand adr...@elastic.co wrote:

 On Thu, May 21, 2015 at 11:49 PM, James Macdonald 
 james.macdon...@geofeedia.com wrote:

 Hi, I am a little confused by your response. Are you saying that
 query/filter caches are invalidated across all data in a shard every time
 the refresh interval ticks over?


 Sorry for the confusion:
  - the query cache caches entire requests per index, and is competely
 invalidated across all data every time the refresh interval ticks over AND
 there have been changes since the last refresh
  - the filter cache caches matching documents per segment, it is
 invalidated per segment only when a segment goes away (typically because
 it's been merged to a larger segment), which is unfrequent for large
 segments
  - the fielddata cache caches the document-value mapping per segment and
 has the same invalidation rules as the filter cache


 I was under the impression that all field data and caching related
 operations were performed on a Lucene index segment level and that the
 caches would only be invalidated for a given segment if that segment had
 changed since the last refresh. Since most data is stored in large segments
 that don't take fresh writes and seldom merge this would mean that most
 caches are good for long periods of time; even if the shard is under
 constant indexing load. Am I mistaken?


 This is right for the fielddata and filter caches, but not for the query
 cache.

 --
 Adrien

 --
 Please update your bookmarks! We have moved to https://discuss.elastic.co/
 ---
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAjZBXouCRxNtvVmtsx%2BZK1_iymRC-7YSojULF7pSK%2Bebg%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAjZBXouCRxNtvVmtsx%2BZK1_iymRC-7YSojULF7pSK%2Bebg%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAABsnTZtOWEF%3DRjBj_Xmt9x%2BHQdREn4vT2dFnbjRiF88oPnofA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard query cache size

2015-05-21 Thread James Macdonald
Hi, I am a little confused by your response. Are you saying that
query/filter caches are invalidated across all data in a shard every time
the refresh interval ticks over?

I was under the impression that all field data and caching related
operations were performed on a Lucene index segment level and that the
caches would only be invalidated for a given segment if that segment had
changed since the last refresh. Since most data is stored in large segments
that don't take fresh writes and seldom merge this would mean that most
caches are good for long periods of time; even if the shard is under
constant indexing load. Am I mistaken?

Thanks,
James

On Thu, May 21, 2015 at 9:28 AM, Adrien Grand adr...@elastic.co wrote:

 It depends how likely it is for you to run the same aggregation again.
 Note that this cache is fully invalidated at every refresh (meaning either
 every second by default, or every time that you update/add/remove documents
 if you perform less than 1 operation per second). So this cache will only
 be used if you are likely to run the exact same request twice or more in a
 short period of time.

 We assumed that this situation is not common, so a small cache would be
 enough for the maybe 4 or 5 requests that would be run again and again. You
 can increase it if you think it will be helpful in your case although I
 would advise to be careful, maybe memory would be better spent on eg. the
 filesystem cache.

 On Thu, May 21, 2015 at 3:41 PM, Mike Sukmanowsky 
 mike.sukmanow...@gmail.com wrote:

 Hi all,

 We store Marvel-style timeseries data in Elasticsearch and make very
 heavy use of aggregations (all queries are effectively aggregations).

 We've been playing around with the shard query cache and have a question.

 Is there a reason the shard query cache is set to such a low level of JVM
 heap by default? 1% seems awfully low unless ES assumes most people aren't
 making heavy use of aggregations? Any harm in us significantly boosting
 this from 1% to say 15% of heap? Most of our machines have 30GB of RAM and
 heap at 50% of that (15GB) so the query cache is 150MB by default. Think
 we'd like to experiment growing that to at least 10% of heap to have 1GB in
 use for this cache.

 Mike

 --
 Please update your bookmarks! We have moved to
 https://discuss.elastic.co/
 ---
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a3a0fa7b-49f8-4d78-a520-6eeb16d53de3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a3a0fa7b-49f8-4d78-a520-6eeb16d53de3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien

 --
 Please update your bookmarks! We have moved to https://discuss.elastic.co/
 ---
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAhjZW%2B7BoLqXOk8nEfOvaKrgUroZhbG41BVUDh2bHhEnQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAhjZW%2B7BoLqXOk8nEfOvaKrgUroZhbG41BVUDh2bHhEnQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAABsnTYY85YeKRUBFLoMcAfnPj%3DBP6GqM020N%2BrZ9yp32%2BQY0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation not limited to filter?

2015-04-10 Thread James Macdonald
I had a similar problem recently and solved it by moving my filter into a
filtered query (leaving the query as a match_all), see documentation here
http://www.elastic.co/guide/en/elasticsearch/reference/1.5/query-dsl-filtered-query.html
.

I am not certain why filters do not restrict the scope of the aggregates,
but queries do, but I suspect it interprets the filter (not wrapped in a
filtered_query) as a post_filter (
http://www.elastic.co/guide/en/elasticsearch/reference/1.x/search-request-post-filter.html).
Maybe someone else actually knows why.


Hope that helps,
James

On Fri, Apr 10, 2015 at 11:39 AM, James Green james.mk.gr...@gmail.com
wrote:

 I must be doing something stupid!

 Using the Java client I can perform a search with a filter and iterate
 over the hits. I see exactly the right source documents.

 If I add an aggregation, I see the expected keyAsText string but the
 docCount reflects the volume if the filter had not been applied.

 I expected the aggregation to be restricted to the results within that
 filter?

 Thanks,

 James

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMH6%2BaxkmZVfDhkJW-bWPrRs5BMzTem-2zCQRWeF%2BLQCR2L5sA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAMH6%2BaxkmZVfDhkJW-bWPrRs5BMzTem-2zCQRWeF%2BLQCR2L5sA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAABsnTbD0JgcpMCMWuzjVC1W3C-pt6pC6PJG0xT31O44MZQs%3DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.