While shard was relocating, experienced very high search response times

2015-02-17 Thread Nick Canzoneri
We use rolling daily indexes. After we deleted one, Elasticsearch decided
to relocate a replica shard to another node. Totally fine, that's what I
expect Elasticsearch to do.

What I didn't expect was for search queries that might have used that shard
to bomb out. Writes to different indexes still performed fine, but searches
either timed out or returned with very long response times. Once the shard
completed relocating (15min for a 26gb shard), performance returned to
normal.

We have all the logs and marvel data for this time period, and I don't see
anything that seems out of the ordinary. What logs/settings I should be
looking at so we don't have this problem again?

Thanks,


-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yPfQ1dLW-QengDs6gxfP9bjhnH7q7qnR-kYnY7nZu8d_g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Snapshot delete is executed but API call never returns

2015-01-23 Thread Nick Canzoneri
It looks like it is this issue:
https://github.com/elasticsearch/elasticsearch/issues/8958

On Thu, Jan 22, 2015 at 4:23 PM, Nick Canzoneri n...@wildbit.com wrote:

 Elasticsearch version 1.4.2
 The repository is of type fs and points to a NFS directory.
 Taking snapshots succeed just fine.

 When deleting a snapshot however, I can wait several minutes with no
 response. Pretty much immediately though if I GET the snapshot I get a 404
 in response (expected). Also, in the repository directory itself the
 folders related to the snapshot are deleted.

 The same behavior occurs if I shoot the request at a client node, the
 master or a data node.

 Let me know what more info I should provide.

 Thanks,

 --
 Nick Canzoneri
 Developer, Wildbit http://wildbit.com/
 Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
 dploy.io




-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yOOYk9San7%3Da-HZurepwwFQUZRHPgJztANYPBx4QfcWww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Snapshot delete is executed but API call never returns

2015-01-22 Thread Nick Canzoneri
Elasticsearch version 1.4.2
The repository is of type fs and points to a NFS directory.
Taking snapshots succeed just fine.

When deleting a snapshot however, I can wait several minutes with no
response. Pretty much immediately though if I GET the snapshot I get a 404
in response (expected). Also, in the repository directory itself the
folders related to the snapshot are deleted.

The same behavior occurs if I shoot the request at a client node, the
master or a data node.

Let me know what more info I should provide.

Thanks,

-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yPCethXRQGXYQXKjR08i5zOoXwXY%3DoxvWf%2BUhEkW6UTCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Marvel question: An alternative for having to enable HTTP for data nodes?

2014-12-23 Thread Nick Canzoneri
We're testing out marvel and noticed that it causes failures unless HTTP is
enabled for the node. This isn't ideal for data nodes that we've disabled
HTTP on.

Is this just the way things work or is there an alternative I'm not aware
of?

It's not a big deal, but does means we can't use the Sniffing Connection
Pool that some clients support because it round robins across all nodes
capable of HTTP traffic.

Thanks,

-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yMtDHLFsOXPagxH4J0fsxFXpOYOhV7XkqSsChwVHnaydg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster clients

2014-12-11 Thread Nick Canzoneri
Most (all?) of the official clients have connection pool support that will
query the cluster status and round robin across all the nodes with client
capability enabled.

Here's the appropriate link to the python docs:
http://elasticsearch-py.readthedocs.org/en/master/connection.html#connection-pool

Cheers,

On Thu, Dec 11, 2014 at 2:28 PM, Morten Guldager morten.gulda...@gmail.com
wrote:

 I have just started with elasticsearch, have setup a cluster with 4
 data/master nodes. everything pretty default. The nodes are called E1, E2,
 E3 and E4.

 I have implemented a few pieces of client software, and doing RESTful
 communication against http://E1:9200/ is super easy.

 But how are the clients supposed to address the cluster? Pointing directly
 to a specific cluster node seems not right, that particular node might be
 down. Also, configuring all clients with knowledge about all cluster nodes
 seems impractical too.

 Of cause I could setup old-school round robin DNS. Is that the way to do
 it or do we have smarter options?

 Ah yes, I'm using python and the elasticsearch module. Everything is on
 linux.


 /mogul

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8e0d2e30-dd34-4ee3-854e-52edece9b821%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/8e0d2e30-dd34-4ee3-854e-52edece9b821%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yMbyDShtovep46rnVM7-NY3ABDYwz3m1HorPRORLTWk8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Term filter causes memory to spike drastically

2014-11-21 Thread Nick Canzoneri
This is something that I just discovered as well.

Using a top-level filter is really a post_filter (it's renamed in later
versions of ES):
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html

So, this will execute the query first (a default match_all: {}) and then
execute the filter on that result set. This is not very efficient for your
query, since I expect you expected having filter there to work like a
pre-filter and filter out results *before* executing the query.

To do that, you need to use a filtered query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

In your case, the resulting query would look like:

curl -XGET 'http://localhost:9200/my-index/my-doc-type/_search' -d '{
query: {
  filtered: {
filter: {
   term: {void: false}
}
  }
},
fields: [[user_id1, user_name, date, status, q1,
  q1_unique_code, q2, q3]],
size: 5, sort: [date_value]}'



On Fri, Nov 21, 2014 at 7:07 AM, Ajay Divakaran ajay.divakara...@gmail.com
wrote:

 The term filter that is used:

 curl -XGET 'http://localhost:9200/my-index/my-doc-type/_search' -d '{
 filter: {
term: {void: false}
 },
 fields: [[user_id1, user_name, date, status, q1,
   q1_unique_code, q2, q3]],
  size: 5, sort: [date_value]}'


- The 'void' field is a boolean field.
- The index store size is 504mb.
- The elastic search setup consists of only a single node and the
index consists of only a single shard and 0 replicas. The version of
elasticsearch is 0.90.7
- The fields mentioned above is only the first 8 fields. The actual
term filter that we execute has 350 fields mentioned.

 *We noticed the memory spiking by about 2-3gb though the store size is
 only 504mb.*

 *Running the query multiple times seems to continuously increase the
 memory.*

 Could someone explain why this memory spike occurs?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/7c4ea660-9411-4d1d-a86c-84f1c43f4f7e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/7c4ea660-9411-4d1d-a86c-84f1c43f4f7e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yPQo5G0w2ShxtC4E9K90Bji1tQca3RqcG%2BvGUDR8aAqMQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk load performance

2014-11-19 Thread Nick Canzoneri
On the index settings side, you can dynamically turn off the index
refresh_interval and also reduce the number of shard replicas for the
duration of the bulk import.

Described here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk

On Wed, Nov 19, 2014 at 2:53 AM, xaviertrujillo...@gmail.com wrote:

 Hello,

 I'm trying to do a bulk load of ~10M JSON docs (12.8Gb) with some
 geographical information into an elasticsearch index. With our current
 params, the loading is taking around 20-25 minutes to run, but we think it
 should be faster. Are these numbers similar to what other users are
 getting? Do you have any hints on how to get better performance? Any help
 will be appreciated. Please find the details below.

 Our ES cluster is version 1.1.1 with 11 nodes, and we are using
 Elasticsearch-MapReduce libraries 2.0.2 to do the bulk-load, setting the
 numbers of reducers to 11. Other params we use are:

 es.input.json=true
 es.mapping.id=id
 es.batch.size.bytes=10M
 es.batch.size.entries=1

 The average doc size is 1.3Kb, and each doc contains a bbox field with
 the shape definition like this:

 bbox: {
 type: envelope,
 coordinates: [
 [
 -77.08488844489459,
 38.9502995339637
 ],
 [
 -77.0844224567727,
 38.9502305534064
 ]
 ]
 }

 We are using the following mapping for this index, because these are the 3
 fields of our docs we are more interested in:

 {
 properties: {
 bbox: {
 precision: 10m,
 tree: quadtree,
 type: geo_shape
 },
 id: {
   type: string,
   index: not_analyzed
 },
 streets: {
   type: string
 }
 }
 }

 This is a typical output of the MapReduce job:

 14/11/17 09:05:44 INFO mapred.JobClient:   Elasticsearch Hadoop Counters
 14/11/17 09:05:44 INFO mapred.JobClient: Bulk Retries=0
 14/11/17 09:05:44 INFO mapred.JobClient: Bulk Retries Total Time(ms)=0
 14/11/17 09:05:44 INFO mapred.JobClient: Bulk Total=1375
 14/11/17 09:05:44 INFO mapred.JobClient: Bulk Total Time(ms)=11714959
 14/11/17 09:05:44 INFO mapred.JobClient: Bytes Accepted=14351811146
 14/11/17 09:05:44 INFO mapred.JobClient: Bytes Received=5498829
 14/11/17 09:05:44 INFO mapred.JobClient: Bytes Retried=0
 14/11/17 09:05:44 INFO mapred.JobClient: Bytes Sent=14351811146
 14/11/17 09:05:44 INFO mapred.JobClient: Documents Accepted=10129699
 14/11/17 09:05:44 INFO mapred.JobClient: Documents Received=0
 14/11/17 09:05:44 INFO mapred.JobClient: Documents Retried=0
 14/11/17 09:05:44 INFO mapred.JobClient: Documents Sent=10129699
 14/11/17 09:05:44 INFO mapred.JobClient: Network Retries=0
 14/11/17 09:05:44 INFO mapred.JobClient: Network Total
 Time(ms)=11732552
 14/11/17 09:05:44 INFO mapred.JobClient: Node Retries=0
 14/11/17 09:05:44 INFO mapred.JobClient: Scroll Total=0
 14/11/17 09:05:44 INFO mapred.JobClient: Scroll Total Time(ms)=0

 Thanks,
 Xavier.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/70956234-78d0-4ee2-9536-398ac529b76a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/70956234-78d0-4ee2-9536-398ac529b76a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yPDSs_PABPi7Ydnr0h8utGAwOTOJuyDvEBm4fNMLG-Sqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Questions about the timeout search option

2014-11-17 Thread Nick Canzoneri
Link to appropriate documentation:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_search_options.html#_literal_timeout_literal_2

In particular: By default, the coordinating node waits to receive a
response from all shards. If one node is having trouble, it could slow down
the response to all search requests.

1. Is it the case where if I don't set the timeout, the coordinating node
will really wait forever?

2. When would this situation occur? I'm assuming when cluster state is
yellow, the coordinating node knows not to request from the bad shard.
Just high load/GC on a particular node?

3. Any reason not to set this on all my requests to a relatively high value?

Thanks,

-- 
Nick Canzoneri
Developer, Wildbit http://wildbit.com/
Beanstalk http://beanstalkapp.com/, Postmark http://postmarkapp.com/,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yM8UmCq5Vd%3DOMtOtwH-Qar5ALxTwZq_Tm2HcK3Too1tTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.