date:20131218

Re: Elasticsearch cluster fails to stabilize

2013-12-18 Thread Martin Forssen

On some further debugging I enabled debug logging on one of the nodes. Now 
when I try to get the indices stats I get the following in the log on the 
debugging node:
[2013-12-18 08:02:01,078][DEBUG][index.shard.service  ] [NODE6] 
[reference][10] Can not build 'completion stats' from engine shard state 
[RECOVERING]
org.elasticsearch.index.shard.IllegalIndexShardStateException: 
[reference][10] CurrentState[RECOVERING] operations only allowed when 
started/relocated
at 
org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.acquireSearcher(InternalIndexShard.java:600)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.acquireSearcher(InternalIndexShard.java:595)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.completionStats(InternalIndexShard.java:536)
at 
org.elasticsearch.action.admin.indices.stats.CommonStats.init(CommonStats.java:151)
at 
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at 
org.elasticsearch.node.service.NodeService.stats(NodeService.java:165)
at 
org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:100)
at 
org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:43)
at 
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:273)
at 
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:264)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Looking in head I see that this node has a number of green shards, but 
shard 10 is yellow (recovering). This smells like a bug.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/62fdf587-7779-4814-97a6-f1381993eba5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: replace DailyRollingFileAppender for index slow log

2013-12-18 Thread Olivier Morel

i found some discusion about the same problem but he doesn't say how to
resolv it !

2013/12/18 joergpra...@gmail.com joergpra...@gmail.com

Can you give more info about log type indexSlowLog? Did you write an
implementation of org.apache.log4j.IndexSlowLogAppender - and why?

Look here

https://groups.google.com/forum/#!topic/elasticsearch/pPRXkI9P2hA

and here

http://www.elasticsearch.org/blog/logging-elasticsearch-events-with-logstash-and-elasticsearch/

how to set a standard log4j appender class, or your favorite custom class
name in type.

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_iO_l-snRas/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHW2b5kUVQr3rST3YGgO7fWcuJYX1qHxoFEeeB_eGuD2Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Cordialement

Olivier Morel
tel : 06.62.25.03.77

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJnWC-8z%3Du4Rr-DP%2B3z4aD%2BOheihWzB%2BsGs5N%3DUU78DA37QT%2BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: spring-data elasticsearch support?

2013-12-18 Thread Mohsin Husen

Currently this feature is not supported. but looks very handy and useful
feature.
we will think about it soon.
please check my latest reply at
https://jira.springsource.org/browse/DATAES-41

HTH
Mohsin

On Wednesday, 18 December 2013 15:32:02 UTC, Andra Bennett wrote:

Hi David,

I am able to push a mapping in Java, but I see how I can merge that with
an externally defined mapping as you suggest below.

But, is the spring-elasticsearch library compatible with the elasticsearch
spring-data library?

Thanks!
Andra

On Monday, December 16, 2013 1:56:18 PM UTC-5, David Pilato wrote:

I guess that you basically have to send a mapping.
I think that once you have a Node you can get a client from it and push a
mapping in Java, right?

I did not play yet with spring data project for ES but I will soon.
As author of https://github.com/dadoonet/spring-elasticsearch, I will
probably do it like this with it.

Not sure it helps.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 déc. 2013 à 19:23, Andra Bennett and...@gmail.com a écrit :

Hello,

I am using spring-data elastic
searchhttps://github.com/spring-projects/spring-data-elasticsearchto
configure my node and index my data.

I'd like to know how to enable the timestamp feature for the indexing,
and how to map it to a custom field.

Does anyone in the elasticsearch have experience with the spring-data
elasticsearch library? Or this question better suited for the Spring forum.

Thanks,
Andra

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3141bf8b-bd9e-43c8-afdb-3ffcf7fb53a5%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5b77905a-5c08-4e1b-9653-3aa8b3f43c2c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

1.0.0.Beta2 OOM logs

2013-12-18 Thread joergpra...@gmail.com

I ran some heap scaling tests how to find out the required heap for my
workload.

My config was ES 1.0.0.Beta2 cluster, 3 RHEL 6.3 nodes, Java 1.8.0-ea JVM
25.0-b56, 4GB heap, G1 GC (and some tuning for segment merge and bulk)

Workload: mixed, scan/scroll query over 1.6m docs plus term queries over
20m docs (unknown queries per second, but higher than 5000) with bulk
indexing (5000 docs per second)

The 4GB exercise result was OOM on all nodes after an hour run, with all
kinds of error messages. The cluster restarted ok afterwards so it did not
matter at all. Increasing heap to 6GB and redoing the exercise succeeded
after 52 minutes.

I just want to share the OOM logs with anyone who might be interested to
have a look, because they are so pretty ;)

https://gist.github.com/jprante/8024139

FYI I'm considering a memory watchdog on shard level that might detect low
free heap condition in time and can return warnings to the bulk client, so
the bulk client might throttle, suspend, or exit the indexing cleanly,
before OOMs start to break out in the cluster with all the risk of crashing
shards or node dropouts. Surely not an exact science but with some
heuristics it should work (e.g. below a threshold of 10mb free heap there
should no execution of the indexing engine allowed)

Would love to have more time for testing the exciting new 1.0.0.Beta2
features, but right now I'm just happy to run my data reconciliations
successfully.

Jörg

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEV9HpzbT5HzdG5s0KspvQC%2Ba153-iTkue4QvRrCiTVxw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

autocomplete mapping error

2013-12-18 Thread Joshua Corb

We are running into an issue where we are getting mapping issues. We are 
have tried on both Ubuntu and OSX. We are using java 1.7. Attached is our 
node configuration, and below is the error we are getting.

11:12:47,856 WARN  [org.elasticsearch.indices.cluster] 
(elasticsearch[Nemesis][clusterService#updateTask][T#1]) [Nemesis] [need] 
failed to add mapping [need], source 
[{need:{properties:{defaultTextValue:{type:string},description:{type:string},endDate:{type:string},id:{type:string},imageURL:{type:string},location:{type:geo_point},mainCategory:{type:string,store:true,index_analyzer:keyword},postDate:{type:string},subCategory:{type:string,store:true,index_analyzer:keyword},tags:{type:string,store:true,index_analyzer:keyword},title:{type:multi_field,fields:{title:{type:string},title.autocomplete:{type:string,store:true,index_analyzer:autocomplete,search_analyzer:autocomplete_search,include_in_all:false},title.untouched:{type:string,index:not_analyzed,omit_norms:true,index_options:docs,include_in_all:false}}},user:{type:string,store:true,index_analyzer:keyword},version:{type:long]:
 
org.elasticsearch.index.mapper.MapperParsingException: Analyzer 
[autocomplete] not found for field [title.autocomplete]
at 
org.elasticsearch.index.mapper.core.TypeParsers.parseField(TypeParsers.java:107)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.index.mapper.core.StringFieldMapper$TypeParser.parse(StringFieldMapper.java:150)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.index.mapper.multifield.MultiFieldMapper$TypeParser.parse(MultiFieldMapper.java:130)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:263)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parse(ObjectMapper.java:219)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:176)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:314) 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:193) 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.indices.cluster.IndicesClusterStateService.processMapping(IndicesClusterStateService.java:417)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyMappings(IndicesClusterStateService.java:381)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:179)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:414)
 
[elasticsearch-0.90.7.jar:]
at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
 
[elasticsearch-0.90.7.jar:]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[rt.jar:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[rt.jar:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3c8b1aaa-96e7-43a0-a148-38d92c79bbef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


elasticsearch.yml
Description: Binary data

Re: TransportClient failures with 0.90.3 cluster, but NodeClient works without failures

2013-12-18 Thread InquiringMind

Jörg,

*Beside the cluster node JVMs you also have to take care of the client JVM. 
 Are you accessing the cluster also with Solaris x86 and Java 6u18?*


Oooh. Don't know why this didn't occur to me. Short answer: No.

Java on the MacBook (where the client / driver runs):

$ java -version
java version 1.6.0_65
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-10M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

Java on all 3 virtual Solaris hosts of the remote 3-node ES cluster:

$ java -version
java version 1.6.0_18
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) Client VM (build 16.0-b13, mixed mode, sharing)
 


 *Can you give more information about noticeably quicker? What do you 
 test and how much load? Searching or indexing?*

 The ES index and type are mapped to enable TTL with a default 10s TTL 
value for all documents, and the default 60s TTL interval for the index. 
I've disabled the indexing for all fields; documents are queried only by 
their _id. Ad-hoc queries weren't necessary for this particular test case.

The remote far-away 3-node ES cluster is running on 3 Solaris x86-64 VMs 
using Zen unicast discovery. There is also a single-node ES cluster running 
on the MacBook.

The driver contains a writer thread pool, a reader thread pool, and 
BigQueue in the middle. All of the runs below were configured with 8 writer 
threads and 8 reader threads. For all tests, the driver was run on the 
MacBook.

The writer threads obtain a unique object, serialize it to JSON, add it to 
ES, and then add it to the queue. The reader threads read from the queue, 
deserialize into the object, and query by index+type+id to verify it's 
there.

The timing values are shown by the driver using the super-cool TimeValue 
object class. Nice touch!

*1. Local single-node ES cluster and the driver, all running on the 
MacBook. The TransportClient is a little bit faster than the NodeClient:*

1a. Using the TransportClient:

  generated-connections=1629365 elapsed=5m conn/sec=5430
  [db-update: total=1629358 time=32.5m time/update=1.1ms]
  [db-query: total=1629357 time=16.8m time/query=620.1micros]
  [queue: current=0 max=373]

1b. Using the NodeClient:

  generated-connections=1551379 elapsed=5m conn/sec=5171
  [db-update: total=1551371 time=32.8m time/update=1.2ms]
  [db-query: total=1551371 time=16.4m time/query=637.7micros]
  [queue: current=0 max=383]

*2. Driver running locally on the MacBook connected to the far-away 3-node 
ES cluster running on Solaris x86-64 VMs. In this case, the NodeClient was 
seen to be faster, particularly in the area of updates.*

2a. The driver uses a TransportClient but only 2 of the 3 nodes are added 
to its list of inet addresses:

  generated-connections=13427 elapsed=5m conn/sec=44 
  [db-update: total=13419 time=39.9m time/update=178.4ms] 
  [db-query: total=13419 time=18.7m time/query=83.6ms]
  [queue: current=0 max=7]

2b. The driver uses a client-only NodeClient with Zen unicast discovery and 
all 3 nodes configured for it:

  generated-connections=27592 elapsed=5m conn/sec=91 
  [db-update: total=27584 time=39.8m time/update=86.7ms] 
  [db-query: total=27584 time=38.7m time/query=84.2ms]
  [queue: current=0 max=26]

Regards,
Brian 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c2e7390d-2ca2-4094-bdf2-c2969daeee5a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Shard relocation progress

2013-12-18 Thread Mohit Anchlia

What's the best way to monitor shard relocation that occurrs when one add
new nodes?

Is there a way to control the relocation and do it manually with few shards
at a time?

What are the best practices for cluster that is contantly received high
volume traffic?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWovrNe4cJvqAuvPV26zErezm0eqo9rF%2B%3DFpFk_Pzh-q0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Very slow ElasticSearch Index

2013-12-18 Thread Mark Walkom

Intra cluster comms are all handled over HTTP. What is the link between
your DCs like; 100M, 1G, 10G?

You could try using something like logstash to replicate the indexes, that
way you can have two clusters and it should reduce any latency.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 December 2013 03:10, lekkie omotayo lekkie.ay...@gmail.com wrote:

 Are there any other protocols other than HTTP I can send request over?
 Something faster than HTTP? Or do you mean the physical NIC? We run on a LAN

 On Wednesday, 18 December 2013 11:49:07 UTC+1, Mark Walkom wrote:

 The lag over that inter-DC link is probably causing your issues.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 18 December 2013 18:12, lekkie omotayo lekkie...@gmail.com wrote:

 Yes they all have the same capacity.

 Yes, they are in different data centers (off-site).


 On Tuesday, 17 December 2013 22:33:07 UTC+1, Mark Walkom wrote:

 ES will only go as fast as the slowest node. With that in mind, are
 your DR nodes the same capacity?

 I also notice they are in different subnets, does that imply they are
 in different datacenters?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 18 December 2013 00:44, lekkie omotayo lekkie...@gmail.com wrote:

 Thanks for the insight.

  First tip would be to drop OpenJDK and move to Oracle, you'll get
 a lot better performance.

 So I changed to Oracle JDK and latency dropped from 4000millisecond to
 around 2500millisecond.

 .

  It might also be worth removing indices.ttl.interval and just
 using a script to delete old indices as TTL searches can use a fair bit of
 resources.

 We also dropped indices.ttl.interval and it further dropped to
 1500milliseconds.


  You also mentioned you have 2 nodes, but there are a lot more IPs
 listed in the discovery hosts, is that intentional? Same for
 minimum_master_nodes being 3.
 Yes, the other 2 nodes are DR nodes. So we basically have 4 nodes but
 2 are for disaster recovery. And the discovery.zen.minimum_master_nodes
 was calculated based on the n/2 + 1, where n was 4.

 One other thing to note, every request is an upsert.

 What we have now is 1500milliseconds per upsert. This is still very
 high. We are looking at doing sub-zero millisecond or 10s of millisecond
 for bulk upload. Can this be achieved or it is a pipe dream?



 On Tuesday, 17 December 2013 09:12:52 UTC+1, Mark Walkom wrote:

 First tip would be to drop OpenJDK and move to Oracle, you'll get a
 lot better performance.
 Bulk depends a lot on your setup and document size etc, but upwards
 of 5K is generally towards the upper limit.
 It might also be worth removing indices.ttl.interval and just using a
 script to delete old indices as TTL searches can use a fair bit of
 resources.

 You also mentioned you have 2 nodes, but there are a lot more IPs
 listed in the discovery hosts, is that intentional? Same for
 minimum_master_nodes being 3.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 17 December 2013 18:55, lekkie omotayo lekkie...@gmail.comwrote:

 Hi guys,


 We index a document at *** milli on ElasticSearch, which I think is
 too slow especially for the amount of resources we have setup. We will 
 like
 to index at the rate of 500tps. Each document weighs between 20K and 
 30K.
 How many indexes are advisable to be done at once (assuming we can
 afford to send multiple http index request to the server at once)? I
 understand bulk indexing is a preferred approach, for a 30K document how
 much can be bulked at once? How many http bulk request (supposing I am
 using a multi-threaded http client ot make requests) is advisable to 
 make?

 I will appreciate suggestions and how to index this document as fast
 as possible. We have two nodes set up, the config below is for one out 
 of
 the two:

 Shards: 5
 Replica: 1

 nodes : {
 T5l5mvIdQsW3je7WmSPOcg : {
   name : SEARCH-01,
   version : 0.90.7,
   attributes : {
 rack_id : prod,
 max_local_storage_nodes : 1
   },
   settings : {
 node.rack_id : prod,
 action.disable_delete_all_indices : true,
 cloud.node.auto_attributes : true,
 indices.ttl.interval : 90d,
 node.max_local_storage_nodes : 1,
 bootstrap.mlockall : true,
 index.mapper.dynamic : true,
 cluster.routing.allocation.awareness.attributes :
 rack_id,
 discovery.zen.minimum_master_nodes : 3,
 gateway.expected_nodes : 1,
 discovery.zen.ping.unicast.hosts :
 172.25.15.170,172.25.15.172,172.46.1.170,172.46.1.172,
 discovery.zen.ping.multicast.enabled : false,

Re: Serialization issues on 0.90.3