Re: Elasticsearch 1.4.2 JVM memory leak ?

2015-04-06 Thread mjdude5
How many shards and segments do you have?  I believe both shards and 
segments require memory, for segments merging them can reduce the memory 
footprint.

Are you graphing your heap usage?  I think what's useful is looking at the 
max(min(heap)) over a few days, assuming you're using the 75% oldgen 
threshold you'll see the heap usage reduce significantly when it runs.  
That baseline memory usage is useful to know.

On Monday, April 6, 2015 at 12:13:17 AM UTC-4, Abhishek Andhavarapu wrote:
>
> Hi,
>
> We have about 30 node elasticsearch cluster. We often run in to out of 
> memory issue and when I look in the JVM memory which is usually around 
> 75-80% which is 24-25 gigs of 30 gigs heap. The filter cache and field 
> cache add to up 5GB on a node, I'm trying to understand whats the other 
> 15-20GB in the heap. We have 10% caps for filter and field cache. What 
> could be the other 15-20Gb in the heap ? We are on es 1.4.2 are there are 
> any know memory leaks ? Couldn't these be the objects waiting for garbage 
> collection ?  If not how would I know ?
>
> Thanks,
> Abhishek
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7e26604-2efa-4872-8ae7-2052b1d845d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Disable re balancing if one node is down.

2015-03-26 Thread mjdude5
You can get a some reallocation control through 
cluster.routing.allocation.enable but getting the exact config you mention 
I don't think is possible.  You could write a cluster-watching script that 
does what you describe though using these settings.  One problem you will 
have is when node2 comes back it needs to re-sync it's replicas from the 
masters, but you can stop balancing after one node goes down via 
cluster.routing.allocation.enable

On Wednesday, March 25, 2015 at 3:47:38 AM UTC-4, Cyril Cherian wrote:
>
> Imagine a case where I have
>
>- 3(AWS) Nodes
>- 1 Index (lest call it friends) with 3 shards and 1 replica.
>
> Name Conventions
> S1 (Index friends primary shard 1)
> S2 (Index friends primary shard 2)
> S3 (Index friends primary shard 3)
> R1 (Replica of Shard 1)
> R2 (Replica of Shard 2)
> R3 (Replica of Shard 3)
>
> Lets say that Node1 has (S1 R2) and is the master
> Node2 has (S2 R3)
> Node3 has (S3 R1)
>
> Now if due to connection failure Node 2 goes down.
> Load balancing will happen and 
> Node 1 will promote the replica (R2) as primary and new replica for (R2) 
> will be created in Node3 
> Finally after load balancing it will be like 
> Node1 has (S1 S2, R3) 
> Node3 has (S3 R1, R2)
>
> During this re balancing heavy IO operations happen and the Elastic search 
> health will become red -> yellow then green.
>
> My requirement is that if Node 2 is down the nodes must not re balance.
> I am ok if the results on query shows results from only shard S1 and S3.
> And when Node 2 is back again no re balancing should happen.
>
> Is this possible..if yes how. 
> Thanks is advance.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a4ce39d-134b-4cd8-a49b-3785a06ae3d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES JVM memory usage consistently above 90%

2015-03-24 Thread mjdude5
As long as it's the first ES instance starting on that node it'll grab 0 
instead of 1.  I don't know if you can explicitly set the data node 
directory in the config.

On Tuesday, March 24, 2015 at 2:16:57 PM UTC-4, Yogesh wrote:
>
> Thanks. Mine is a one node cluster so I am simply using the XCURL to shut 
> it down and the doing "bin/elasticsearch -d" to start it up.
> To check if it has shutdown, I try to hit it using http. So, now how do I 
> start it with the node/0 data directory?
> There is nothing there in node/1 data directory but I don't suppose 
> deleting it would be the solution? (Sorry for the basic questions I am new 
> to this!)
>
> On Tue, Mar 24, 2015 at 11:32 PM, > wrote:
>
>> Sometimes that happens when the new node starts up via monit or other 
>> automated thing before the old node is fully shutdown.  I'd suggest 
>> shutting down the node and verify it's done via ps before allowing the new 
>> node to start up.  In the case of monit if the check is hitting the http 
>> port then it'll think it's down before it actually fully quits.
>>
>> On Tuesday, March 24, 2015 at 1:56:54 PM UTC-4, Yogesh wrote:
>>>
>>> Thanks a lot mjdude! It does seem like it attached to the wrong data 
>>> directory.
>>> In elasticsearch/data/tool/nodes there are two 0 and 1. My data is in 0 
>>> but node stats shows the data directory as elasticsearch/data/tool/nodes/
>>> 1.
>>> Now, how do I change this?
>>>
>>> On Tue, Mar 24, 2015 at 11:02 PM,  wrote:
>>>
 When it restarted did it attach to the wrong data directory?  Take a 
 look at _nodes/_local/stats?pretty and check the 'data' directory 
 location.  Has the cluster recovered after the restart?  Check 
 _cluster/health?pretty as well.

 On Tuesday, March 24, 2015 at 1:01:52 PM UTC-4, Yogesh wrote:
>
> Thanks Joel and mjdude. What I mean is that ES is using 99% of the 
> heap memory (I think since Marvel showed memory as 1 GB which corresponds 
> to the heap, my RAM is 50GB)
> I've increased the ES_HEAP_SIZE to 10g. But another problem has 
> appeared and I'm freaked out because of it!
>
> So, after restarting my ES (curl shutdown) to increase the heap, the 
> Marvel has stopped showing me my data (it still shows that disk memory is 
> lower so that means the data is still on disk) and upon searching Sense 
> shows "IndexMissingException[[my_new_twitter_river] missing]"
>
> Why is this happening?!?!
>
> On Mon, Mar 23, 2015 at 9:25 PM,  wrote:
>
>> Are you saying JVM is using 99% of the system memory or 99% of the 
>> heap?  If it's 99% of the available heap that's bad and you will have 
>> cluster instability.  I suggest increasing your JVM heap size if you 
>> can, I 
>> can't find it right now but I remember a blog post that used twitter as 
>> a 
>> benchmark and they also could get to ~50M documents with the default 1G 
>> heap.
>>
>> On Sunday, March 22, 2015 at 3:30:57 AM UTC-4, Yogesh wrote:
>>>
>>> Hi,
>>>
>>> I have set up elasticsearch on one node and am using the Twitter 
>>> river to index tweets. It has been going fine with almost 50M tweets 
>>> indexed so far in 13 days.
>>> When I started indexing, the JVM usage (observed via Marvel) hovered 
>>> between 10-20%, then started remaining around 30-40% but for the past 
>>> 3-4 
>>> days it has continuously been above 90%, reaching 99% at times!
>>> I restarted elasticsearch thinking it might get resolved but as soon 
>>> as I switched it back on, the JVM usage went back to 90%.
>>>
>>> Why is this happening and how can I remedy it? (The JVM memory is 
>>> the default 990.75MB)
>>>
>>> Thanks
>>> Yogesh
>>>
>>  -- 
>> You received this message because you are subscribed to a topic in 
>> the Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>> pic/elasticsearch/kMSwBqpe2N4/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/0f990291-fa20-4bba-881f-3f378985c8c9%40goo
>> glegroups.com 
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups "elasticsearch" group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/kMSwBqpe2N4/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https

Re: ES JVM memory usage consistently above 90%

2015-03-24 Thread mjdude5
Sometimes that happens when the new node starts up via monit or other 
automated thing before the old node is fully shutdown.  I'd suggest 
shutting down the node and verify it's done via ps before allowing the new 
node to start up.  In the case of monit if the check is hitting the http 
port then it'll think it's down before it actually fully quits.

On Tuesday, March 24, 2015 at 1:56:54 PM UTC-4, Yogesh wrote:
>
> Thanks a lot mjdude! It does seem like it attached to the wrong data 
> directory.
> In elasticsearch/data/tool/nodes there are two 0 and 1. My data is in 0 
> but node stats shows the data directory as elasticsearch/data/tool/nodes/1.
> Now, how do I change this?
>
> On Tue, Mar 24, 2015 at 11:02 PM, > wrote:
>
>> When it restarted did it attach to the wrong data directory?  Take a look 
>> at _nodes/_local/stats?pretty and check the 'data' directory location.  Has 
>> the cluster recovered after the restart?  Check _cluster/health?pretty as 
>> well.
>>
>> On Tuesday, March 24, 2015 at 1:01:52 PM UTC-4, Yogesh wrote:
>>>
>>> Thanks Joel and mjdude. What I mean is that ES is using 99% of the heap 
>>> memory (I think since Marvel showed memory as 1 GB which corresponds to the 
>>> heap, my RAM is 50GB)
>>> I've increased the ES_HEAP_SIZE to 10g. But another problem has appeared 
>>> and I'm freaked out because of it!
>>>
>>> So, after restarting my ES (curl shutdown) to increase the heap, the 
>>> Marvel has stopped showing me my data (it still shows that disk memory is 
>>> lower so that means the data is still on disk) and upon searching Sense 
>>> shows "IndexMissingException[[my_new_twitter_river] missing]"
>>>
>>> Why is this happening?!?!
>>>
>>> On Mon, Mar 23, 2015 at 9:25 PM,  wrote:
>>>
 Are you saying JVM is using 99% of the system memory or 99% of the 
 heap?  If it's 99% of the available heap that's bad and you will have 
 cluster instability.  I suggest increasing your JVM heap size if you can, 
 I 
 can't find it right now but I remember a blog post that used twitter as a 
 benchmark and they also could get to ~50M documents with the default 1G 
 heap.

 On Sunday, March 22, 2015 at 3:30:57 AM UTC-4, Yogesh wrote:
>
> Hi,
>
> I have set up elasticsearch on one node and am using the Twitter river 
> to index tweets. It has been going fine with almost 50M tweets indexed so 
> far in 13 days.
> When I started indexing, the JVM usage (observed via Marvel) hovered 
> between 10-20%, then started remaining around 30-40% but for the past 3-4 
> days it has continuously been above 90%, reaching 99% at times!
> I restarted elasticsearch thinking it might get resolved but as soon 
> as I switched it back on, the JVM usage went back to 90%.
>
> Why is this happening and how can I remedy it? (The JVM memory is the 
> default 990.75MB)
>
> Thanks
> Yogesh
>
  -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups "elasticsearch" group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/kMSwBqpe2N4/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/0f990291-fa20-4bba-881f-3f378985c8c9%
 40googlegroups.com 
 
 .

 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/kMSwBqpe2N4/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/057e53f7-3207-4df9-bbc4-80cea03e189f%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3ca6aa0-f3d9-4067-933c-1a2eb0a15075%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES JVM memory usage consistently above 90%

2015-03-24 Thread mjdude5
When it restarted did it attach to the wrong data directory?  Take a look 
at _nodes/_local/stats?pretty and check the 'data' directory location.  Has 
the cluster recovered after the restart?  Check _cluster/health?pretty as 
well.

On Tuesday, March 24, 2015 at 1:01:52 PM UTC-4, Yogesh wrote:
>
> Thanks Joel and mjdude. What I mean is that ES is using 99% of the heap 
> memory (I think since Marvel showed memory as 1 GB which corresponds to the 
> heap, my RAM is 50GB)
> I've increased the ES_HEAP_SIZE to 10g. But another problem has appeared 
> and I'm freaked out because of it!
>
> So, after restarting my ES (curl shutdown) to increase the heap, the 
> Marvel has stopped showing me my data (it still shows that disk memory is 
> lower so that means the data is still on disk) and upon searching Sense 
> shows "IndexMissingException[[my_new_twitter_river] missing]"
>
> Why is this happening?!?!
>
> On Mon, Mar 23, 2015 at 9:25 PM, > wrote:
>
>> Are you saying JVM is using 99% of the system memory or 99% of the heap?  
>> If it's 99% of the available heap that's bad and you will have cluster 
>> instability.  I suggest increasing your JVM heap size if you can, I can't 
>> find it right now but I remember a blog post that used twitter as a 
>> benchmark and they also could get to ~50M documents with the default 1G 
>> heap.
>>
>> On Sunday, March 22, 2015 at 3:30:57 AM UTC-4, Yogesh wrote:
>>>
>>> Hi,
>>>
>>> I have set up elasticsearch on one node and am using the Twitter river 
>>> to index tweets. It has been going fine with almost 50M tweets indexed so 
>>> far in 13 days.
>>> When I started indexing, the JVM usage (observed via Marvel) hovered 
>>> between 10-20%, then started remaining around 30-40% but for the past 3-4 
>>> days it has continuously been above 90%, reaching 99% at times!
>>> I restarted elasticsearch thinking it might get resolved but as soon as 
>>> I switched it back on, the JVM usage went back to 90%.
>>>
>>> Why is this happening and how can I remedy it? (The JVM memory is the 
>>> default 990.75MB)
>>>
>>> Thanks
>>> Yogesh
>>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/kMSwBqpe2N4/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0f990291-fa20-4bba-881f-3f378985c8c9%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/057e53f7-3207-4df9-bbc4-80cea03e189f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: corrupted shard after optimize

2015-03-24 Thread mjdude5
Quick followup question, is it safe to run -fix while ES is also running on 
the node?  Understanding that some documents will be lost.

On Tuesday, March 24, 2015 at 10:24:26 AM UTC-4, mjd...@gmail.com wrote:
>
> Thanks for the CheckIndex info, that worked!  It looks like only one of 
> the segments in that shard has issues:
>
>   1 of 20: name=_1om docCount=216683
> codec=Lucene3x
> compound=false
> numFiles=10
> size (MB)=5,111.421
> diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, 
> source=merge, lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, 
> os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.6.0_26, 
> java.vendor=Sun Microsystems Inc.}
> no deletions
> test: open reader.OK
> test: check integrity.OK
> test: check live docs.OK
> test: fields..OK [31 fields]
> test: field norms.OK [20 fields]
> test: terms, freq, prox...ERROR: java.lang.AssertionError: 
> index=216690, numBits=216683
> java.lang.AssertionError: index=216690, numBits=216683
> at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252)
> at 
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932)
> at 
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)
> at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
> test: stored fields...OK [3033562 total field count; avg 14 fields 
> per doc]
> test: term vectorsOK [0 total vector count; avg 0 term/freq 
> vector fields per doc]
> test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 
> 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
> FAILED
> WARNING: fixIndex() would remove reference to this segment; full 
> exception:
> java.lang.RuntimeException: Term Index test failed
> at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
>
> This is on ES 1.3.4, but the index I was running optimize on was likely 
> created back in 0.9 or 1.0.
>
> On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote:
>>
>> Hmm, not good.
>>
>> Which version of ES?  Do you have a full stack trace for the exception?
>>
>> To run CheckIndex you need to add all ES jars to the classpath.  It's 
>> easiest to just use a wildcard for this, e.g.:
>>
>>   java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex 
>> ...
>>
>> Make sure you have the double quotes so the shell does not expand that 
>> wildcard!
>>
>> Mike McCandless
>>
>> On Mon, Mar 23, 2015 at 9:50 PM,  wrote:
>>
>>> I did an optimize on this index and it looks like it caused a shard to 
>>> become corrupted.  Or maybe the optimize just brought the shard corruption 
>>> to light?
>>>
>>> On the node that reported the corrupted shard I tried shutting it down, 
>>> moving the shard out and then restarting. Unfortunately the next node that 
>>> got that shard then started with the same corruption issues.  The errors:
>>>
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
>>> shard
>>> Mar 24 01:40:17 localhost 
>>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
>>> [1-2013][0] failed to fetch index version after copying it over
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
>>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
>>> indexUUID [_na_], reason [Failed to start shard, message 
>>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
>>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
>>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
>>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
>>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>>>
>>> I tried using CheckIndex, but had this issue:
>>>
>>> java.lang.IllegalArgumentException: A SPI class of type 
>>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
>>> You need to add the corresponding JAR file supporting this SPI to your 
>>> classpath.The current classpath supports the following names: [Pulsing41, 
>>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
>>> FST41, FSTOrd41, Lucene40, Lucene41]
>>>
>>> When running with:
>>>
>>> java -cp 
>>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
>>>  
>>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>>>
>>> I'm not a java programmer so after I tried other classpath combinations 
>>> I was out of ideas.
>>>
>>>
>>> Any tips?  Looking at _cat/shards the replica is currently marked

Re: corrupted shard after optimize

2015-03-24 Thread mjdude5
Thanks for the CheckIndex info, that worked!  It looks like only one of the 
segments in that shard has issues:

  1 of 20: name=_1om docCount=216683
codec=Lucene3x
compound=false
numFiles=10
size (MB)=5,111.421
diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, source=merge, 
lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, os.arch=amd64, 
mergeMaxNumSegments=-1, java.version=1.6.0_26, java.vendor=Sun Microsystems 
Inc.}
no deletions
test: open reader.OK
test: check integrity.OK
test: check live docs.OK
test: fields..OK [31 fields]
test: field norms.OK [20 fields]
test: terms, freq, prox...ERROR: java.lang.AssertionError: 
index=216690, numBits=216683
java.lang.AssertionError: index=216690, numBits=216683
at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252)
at 
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932)
at 
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
test: stored fields...OK [3033562 total field count; avg 14 fields 
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]
test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 
0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:
java.lang.RuntimeException: Term Index test failed
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)

This is on ES 1.3.4, but the index I was running optimize on was likely 
created back in 0.9 or 1.0.

On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote:
>
> Hmm, not good.
>
> Which version of ES?  Do you have a full stack trace for the exception?
>
> To run CheckIndex you need to add all ES jars to the classpath.  It's 
> easiest to just use a wildcard for this, e.g.:
>
>   java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex 
> ...
>
> Make sure you have the double quotes so the shell does not expand that 
> wildcard!
>
> Mike McCandless
>
> On Mon, Mar 23, 2015 at 9:50 PM, > wrote:
>
>> I did an optimize on this index and it looks like it caused a shard to 
>> become corrupted.  Or maybe the optimize just brought the shard corruption 
>> to light?
>>
>> On the node that reported the corrupted shard I tried shutting it down, 
>> moving the shard out and then restarting. Unfortunately the next node that 
>> got that shard then started with the same corruption issues.  The errors:
>>
>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
>> shard
>> Mar 24 01:40:17 localhost 
>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
>> [1-2013][0] failed to fetch index version after copying it over
>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
>> indexUUID [_na_], reason [Failed to start shard, message 
>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>>
>> I tried using CheckIndex, but had this issue:
>>
>> java.lang.IllegalArgumentException: A SPI class of type 
>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
>> You need to add the corresponding JAR file supporting this SPI to your 
>> classpath.The current classpath supports the following names: [Pulsing41, 
>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
>> FST41, FSTOrd41, Lucene40, Lucene41]
>>
>> When running with:
>>
>> java -cp 
>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
>>  
>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>>
>> I'm not a java programmer so after I tried other classpath combinations I 
>> was out of ideas.
>>
>>
>> Any tips?  Looking at _cat/shards the replica is currently marked 
>> "unassigned" while the primary is "initializing".  Thanks!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups

corrupted shard after optimize

2015-03-23 Thread mjdude5
I did an optimize on this index and it looks like it caused a shard to 
become corrupted.  Or maybe the optimize just brought the shard corruption 
to light?

On the node that reported the corrupted shard I tried shutting it down, 
moving the shard out and then restarting. Unfortunately the next node that 
got that shard then started with the same corruption issues.  The errors:

Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
shard
Mar 24 01:40:17 localhost 
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
[1-2013][0] failed to fetch index version after copying it over
Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
indexUUID [_na_], reason [Failed to start shard, message 
[IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]

I tried using CheckIndex, but had this issue:

java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
You need to add the corresponding JAR file supporting this SPI to your 
classpath.The current classpath supports the following names: [Pulsing41, 
SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
FST41, FSTOrd41, Lucene40, Lucene41]

When running with:

java -cp 
/usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
 
-ea:org.apache.lucene... org.apache.lucene.index.CheckIndex

I'm not a java programmer so after I tried other classpath combinations I 
was out of ideas.


Any tips?  Looking at _cat/shards the replica is currently marked 
"unassigned" while the primary is "initializing".  Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Recommendations for health monitoring

2015-03-23 Thread mjdude5
You probably want to monitor each node as well, _nodes/stats has useful 
disk/cpu/heap/gc stats.  Also has information about thread usage and 
completed tasks to monitor search/index growth.

I don't fully know the answer to #2, but I assume _nodes & _cluster are 
served by management threads.  We hit _nodes/stats and _cluster/health 
every 5min and haven't seen any issues.  Depending on your cluster size I 
don't know if I'd do 60seconds, _nodes/stats can take some time to gather 
if there's a lot of nodes.

On Monday, March 23, 2015 at 11:11:36 AM UTC-4, Joel Potischman wrote:
>
> We currently monitor our app by having a monitoring tool (Pingdom) 
> retrieve a health page from our app that retrieves and displays the 
> Elasticsearch cluster info, e.g.
>
> {
> "status": 200,
> "name": "whatever",
> "cluster_name": "whatever_dev",
> "version": {
> "number": "1.4.4",
> "build_hash": "c38f773fc81201d1abdfde1ca2746fab58efa912",
> "build_timestamp": "2015-02-19T13:05:36Z",
> "build_snapshot": false,
> "lucene_version": "4.10.3"
> },
> "tagline": "You Know, for Search"
> }
>
> If the monitoring process can't reach our app, or our app can't reach 
> Elasticsearch, we'll get an error and an alert, however, this doesn't tell 
> us anything about node and index health. I've made a page that calls 
> ClusterClient.health(level='indices') but want to confirm
>
>1. Is this sufficient for surfacing any issue with our Elasticsearch 
>infrastructure? and
>2. Does this call block query requests/backups, consume a lot of 
>resources, or otherwise create impacts such that we wouldn't want to be 
>calling it every 60 seconds 24x7?
>
> We don't need to have our monitoring page give us a full diagnosis of all 
> conceivable issues, we just need it to trigger an alert that there *is* an 
> issue so we know we have some work to do, while having minimal impact on 
> overall application performance.
>
> Any recommendations on what we should monitor to achieve those two 
> mandates would be greatly appreciated.
>
> Thanks,
>
> -joel
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3d31d67-669e-4175-ae4b-1d734013c977%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES JVM memory usage consistently above 90%

2015-03-23 Thread mjdude5
Are you saying JVM is using 99% of the system memory or 99% of the heap?  
If it's 99% of the available heap that's bad and you will have cluster 
instability.  I suggest increasing your JVM heap size if you can, I can't 
find it right now but I remember a blog post that used twitter as a 
benchmark and they also could get to ~50M documents with the default 1G 
heap.

On Sunday, March 22, 2015 at 3:30:57 AM UTC-4, Yogesh wrote:
>
> Hi,
>
> I have set up elasticsearch on one node and am using the Twitter river to 
> index tweets. It has been going fine with almost 50M tweets indexed so far 
> in 13 days.
> When I started indexing, the JVM usage (observed via Marvel) hovered 
> between 10-20%, then started remaining around 30-40% but for the past 3-4 
> days it has continuously been above 90%, reaching 99% at times!
> I restarted elasticsearch thinking it might get resolved but as soon as I 
> switched it back on, the JVM usage went back to 90%.
>
> Why is this happening and how can I remedy it? (The JVM memory is the 
> default 990.75MB)
>
> Thanks
> Yogesh
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0f990291-fa20-4bba-881f-3f378985c8c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Growing old-gen size

2015-03-23 Thread mjdude5
Thanks, do you know if there's more memory metric reporting that I'm 
missing? I'd like to figure out what's growing the fastest/largest.  
Fielddata I think should show up in the 'fm' column of the node stats.  I'm 
mostly curious what I'm missing in adding up the memory requirements, from 
the node stats I have so far fielddata and segment memory are the two 
dominant components but they don't add up to more than 50% of the max heap.

I know there are some things missing from the stats like metadata memory 
usage, but I'm assuming those are smaller components.

On Sunday, March 22, 2015 at 6:42:29 PM UTC-4, Mark Walkom wrote:
>
> Sounds like you're just utilising your cluster to it's capacity.
>
> If you are seeing GC causing nodes to drop out you probably want to 
> consider either moving to doc values, reducing your dataset or adding more 
> nodes/heap.
>
> On 21 March 2015 at 07:25, > wrote:
>
>> Hello, I'm trying to better understand our ES memory usage with relation 
>> to our workload.  Right now 2 of our nodes have above-average heap usage.  
>> Looking at _stats their old-gen is large, ~6gb.  Here's the _cat usage I've 
>> been trying to make sense of:
>>
>> _cat/nodes?v&h=host,v,j,hm,fm,fcm,sm,siwm,svmm,sc,pm,im,fce,fe,hp
>>
>> hostv   j hm  fm   
>> fcm  smsiwm  svmm sc  pm im fce fe hp
>> host1  1.3.4 1.7.0_71  7.9gb   1.2gb  34.3mb 
>> 2.8gb 1.3mb7.4kb 13144  -1b 0b   0  0 82
>> host2  1.3.4 1.7.0_71  7.9gb   888.2mb 20.3mb 
>> 1.9gb  0b 0b  8962   -1b 0b   0  0 67
>> host3  1.3.4 1.7.0_71  7.9gb   1.1gb 29mb
>> 2.5gb  0b  0b 11070  -1b 0b   0  0 70
>> host4  1.3.4 1.7.0_71  7.9gb   845.2mb 21.6mb 
>> 1.8gb  179.8kb  448b  8024   -1b 0b   0  0 55
>> host5  1.3.4 1.7.0_71  7.9gb   1.3gb 40.7mb  
>> 2.8gb  0b  0b 12615 -1b 0b   0  0 83
>>
>> host1 and host5 when they do GC it looks like they drop ~5-10% so they 
>> bump against the 75% mark again very soon afterwords.  Their oldgen stays 
>> relatively big, host5 currently has ~6gb of old gen.
>>
>> Last week we had an incident where a node started having long GC times 
>> and then eventually dropped out of the cluster, so that's the fear.  It 
>> didn't seem like the GC was making any progress, wasn't actually reducing 
>> the memory size.
>>
>> There must be something using heap that's not reflected in this _cat 
>> output?  The collection_count for the oldgen is increasing but the 
>> used_in_bytes isn't significantly reducing.  Is that expected?
>>
>> thanks for any tips!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5812e26e-a04e-4e03-9990-0c84f843a4ef%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5271138c-f4a6-4261-9c57-82bfcaec36fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Growing old-gen size

2015-03-20 Thread mjdude5
Hello, I'm trying to better understand our ES memory usage with relation to 
our workload.  Right now 2 of our nodes have above-average heap usage.  
Looking at _stats their old-gen is large, ~6gb.  Here's the _cat usage I've 
been trying to make sense of:

_cat/nodes?v&h=host,v,j,hm,fm,fcm,sm,siwm,svmm,sc,pm,im,fce,fe,hp

hostv   j hm  fm   
fcm  smsiwm  svmm sc  pm im fce fe hp
host1  1.3.4 1.7.0_71  7.9gb   1.2gb  34.3mb 
2.8gb 1.3mb7.4kb 13144  -1b 0b   0  0 82
host2  1.3.4 1.7.0_71  7.9gb   888.2mb 20.3mb 
1.9gb  0b 0b  8962   -1b 0b   0  0 67
host3  1.3.4 1.7.0_71  7.9gb   1.1gb 29mb
2.5gb  0b  0b 11070  -1b 0b   0  0 70
host4  1.3.4 1.7.0_71  7.9gb   845.2mb 21.6mb 
1.8gb  179.8kb  448b  8024   -1b 0b   0  0 55
host5  1.3.4 1.7.0_71  7.9gb   1.3gb 40.7mb  
2.8gb  0b  0b 12615 -1b 0b   0  0 83

host1 and host5 when they do GC it looks like they drop ~5-10% so they bump 
against the 75% mark again very soon afterwords.  Their oldgen stays 
relatively big, host5 currently has ~6gb of old gen.

Last week we had an incident where a node started having long GC times and 
then eventually dropped out of the cluster, so that's the fear.  It didn't 
seem like the GC was making any progress, wasn't actually reducing the 
memory size.

There must be something using heap that's not reflected in this _cat 
output?  The collection_count for the oldgen is increasing but the 
used_in_bytes isn't significantly reducing.  Is that expected?

thanks for any tips!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5812e26e-a04e-4e03-9990-0c84f843a4ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.