Re: increase query performance by adding more machines, shouldn't it be linear to # of machines?
yes, all same machines on which only ES with same configuration is running 2014-07-02 14:55 GMT+09:00 David Pilato da...@pilato.fr: Are you using same physical machine for all your VMs? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 2 juil. 2014 à 07:09, Seungjin Lee sweetest0...@gmail.com a écrit : Hi all, I'm testing percolator performance, 50k/s is required condition with 3~4k rules. now I only have 1 simple rule, and 5 es vms with 1 shard and 4 replicas. and using Java transport client like below new TransportClient(settings) .addTransportAddresses(transportAddressList.toArray(new InetSocketTransportAddress[transportAddressList.size()])); when I added just 1 address to transport client, percolator perf is about 10k/s and when I added all 5 of vms to it, perf is about 15k/s so it increases by about 1.5times only even though I added 4 more vm addresses. Is it supposed to be like this? What I was thinking is, if it runs in for example round-robin fashion, there should be about 5 times performance gain. Could you comment anything on this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL3_U412ecCDh0GONWJZy6S9zjY%2By8xvoVsPrt1849csfs-zUA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL3_U412ecCDh0GONWJZy6S9zjY%2By8xvoVsPrt1849csfs-zUA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BAEDE7D4-320E-4123-A783-823FF1EC26ED%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/BAEDE7D4-320E-4123-A783-823FF1EC26ED%40pilato.fr?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL3_U42vJaO0oqfw4DjWP6KK%3DpqtnGEBFGKHOr7Ky7nmbXV5HA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: increase query performance by adding more machines, shouldn't it be linear to # of machines?
Sorry. I meant on how many physical bare metal machines your 5 VMs are running? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 2 juil. 2014 à 07:59, Seungjin Lee sweetest0...@gmail.com a écrit : yes, all same machines on which only ES with same configuration is running 2014-07-02 14:55 GMT+09:00 David Pilato da...@pilato.fr: Are you using same physical machine for all your VMs? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 2 juil. 2014 à 07:09, Seungjin Lee sweetest0...@gmail.com a écrit : Hi all, I'm testing percolator performance, 50k/s is required condition with 3~4k rules. now I only have 1 simple rule, and 5 es vms with 1 shard and 4 replicas. and using Java transport client like below new TransportClient(settings) .addTransportAddresses(transportAddressList.toArray(new InetSocketTransportAddress[transportAddressList.size()])); when I added just 1 address to transport client, percolator perf is about 10k/s and when I added all 5 of vms to it, perf is about 15k/s so it increases by about 1.5times only even though I added 4 more vm addresses. Is it supposed to be like this? What I was thinking is, if it runs in for example round-robin fashion, there should be about 5 times performance gain. Could you comment anything on this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL3_U412ecCDh0GONWJZy6S9zjY%2By8xvoVsPrt1849csfs-zUA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BAEDE7D4-320E-4123-A783-823FF1EC26ED%40pilato.fr. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL3_U42vJaO0oqfw4DjWP6KK%3DpqtnGEBFGKHOr7Ky7nmbXV5HA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5BFE5974-6B7B-465A-A587-AEBDF6C27D23%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Dealing with spam in this forum
Hi, Clinton -- May I suggest: - Some users (e.g., me) who read this list via an email subscription regard ANY spam on the list as an unacceptable state of affairs. This is not a problem with Apache lists, for example, so I would point the finger of blame at Google Groups. - Having N longstanding members who are willing to help ban spammers is equivalent to having N longstanding members who are willing to quickly admit new users. (And you're welcome to add me as N+1.) - Banning is ineffective. Spammers will continuously sign up with new accounts. -- Paul — p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Tue, Jul 1, 2014 at 11:36 AM, Clinton Gormley clinton.gorm...@elasticsearch.com wrote: Hi all Recently we've had a few spam emails that have made it through Google's filters, and there have been a calls for us to change to a moderate-first-post policy. I am reluctant to adopt this policy for the following reasons: We get about 30 new users every day from all over the world, many of whom are early in their learning phase and are quite stuck - they need help as soon as possible. Fortunately this list is very active and helpful. In contrast, we've only ever banned 34 users from the list for spamming. So making new users wait for timezones to swing their way feels like a heavy handed solution to a small problem. Yes, spammers are annoying but they are a small minority on this list. Instead, we have asked 10 of our long standing members to help us with banning spammers. This way we have Spam Guardians active around the globe, who only need to do something if a spammer raises their ugly head above the parapet. One or two spam emails may get through, but hopefully somebody will leap into action and stop their activity before it becomes too tiresome. This isn't an exclusive list. If you would like to be on it, feel free to email me. Note: I expect you to be a long standing and currently active member of this list to be included. If this solution doesn't solve the problem, then we can reconsider moderate-first-post, but we've managed to go 5 years without requiring it, and I'd prefer to keep things as easy as possible for new users. Clint -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9af5a09-0295-42e3-bc20-52471828aa96%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c9af5a09-0295-42e3-bc20-52471828aa96%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACArsZ_92BQQSdLgYjKU2PsoQO5%2BFyFafyrFZTyeY720xGgMww%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How to search the records with locations all in a polygon or multiPolygon?
I add the mappings and insert a record with 2 locations ([13, 13], [52, 52]), and I want to search the results with it's locations all in the polygon,not one of the locations in the polygon. would you please tell me how to search the reslut? curl -XPOST localhost:9200/test5 -d '{ mappings : { gistype : { properties : { address:{ properties:{ location:{type : geo_point} } } } } } }' curl -XPUT 'http://localhost:9200/test5/gistype/1' -d '{ name: Wind Wetter, Berlin, Germany, address: [ { name:1, location: [13, 13] }, { name:2, location: [52, 52] } ] }' I searched like this , but I want to search the record locations all in the polygon. So it's wrong . curl -XGET 'http://localhost:9200/test5/gistype/_search?pretty=true' -d '{ query: { filtered: { query: { match_all: {} }, filter: { geo_polygon : { location : { points : [ {lat : 0, lon : 0}, {lat : 14, lon : 0}, {lat : 14, lon : 14}, {lat : 0, lon : 14} ] } } } } } }' @kimchy -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d156505-9695-4315-8826-884d8e4f1f8e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Dealing with spam in this forum
Hi, I do agree with Paul, 200%. I've received in my mailbox at least 49 spams just for the 06/30. I won't call this a few spam email. I'm subscribed for years on many mailing lists, and I'm pretty sure that it would take years to get as much spam on those lists as I get in 1 day on ES mailing list. On 2 juil. 2014, at 08:18, Paul Brown p...@mult.ifario.us wrote: Hi, Clinton -- May I suggest: - Some users (e.g., me) who read this list via an email subscription regard ANY spam on the list as an unacceptable state of affairs. This is not a problem with Apache lists, for example, so I would point the finger of blame at Google Groups. - Having N longstanding members who are willing to help ban spammers is equivalent to having N longstanding members who are willing to quickly admit new users. (And you're welcome to add me as N+1.) - Banning is ineffective. Spammers will continuously sign up with new accounts. -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Tue, Jul 1, 2014 at 11:36 AM, Clinton Gormley clinton.gorm...@elasticsearch.com wrote: Hi all Recently we've had a few spam emails that have made it through Google's filters, and there have been a calls for us to change to a moderate-first-post policy. I am reluctant to adopt this policy for the following reasons: We get about 30 new users every day from all over the world, many of whom are early in their learning phase and are quite stuck - they need help as soon as possible. Fortunately this list is very active and helpful. In contrast, we've only ever banned 34 users from the list for spamming. So making new users wait for timezones to swing their way feels like a heavy handed solution to a small problem. Yes, spammers are annoying but they are a small minority on this list. Instead, we have asked 10 of our long standing members to help us with banning spammers. This way we have Spam Guardians active around the globe, who only need to do something if a spammer raises their ugly head above the parapet. One or two spam emails may get through, but hopefully somebody will leap into action and stop their activity before it becomes too tiresome. This isn't an exclusive list. If you would like to be on it, feel free to email me. Note: I expect you to be a long standing and currently active member of this list to be included. If this solution doesn't solve the problem, then we can reconsider moderate-first-post, but we've managed to go 5 years without requiring it, and I'd prefer to keep things as easy as possible for new users. Clint -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/37AE06FE-1DB5-4CCE-A90B-4B2E61234E9A%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: help in query
oops there is an it that doesn't belong On 07/02/2014 09:24 AM, surfer wrote: That definitely helped it. Thank you Vineeth Regards giovanni On 07/01/2014 07:19 PM, vineeth mohan wrote: Hello Giovanni , I feel this will help - http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_literal_multi_match_literal_query_2.html#_wildcards_in_field_names Thanks Vineeth On Tue, Jul 1, 2014 at 10:19 PM, surfer sur...@crs4.it mailto:sur...@crs4.it wrote: Hi, I'm indexing something like: first doc= { v4 : myvalue } second doc = { v1 : [ { v4 : myvalue }, { v5 : anothervalue } ] } third doc= { v1 : [ { v2 : [ {v4 : myvalue }] } ] } fourth doc = { v1 : [ { v2 : [ { v3 : [ { v4 : myvalue }] } ] } ] } so nested dictionaries and array of dictionaries. I was wondering if there is a query to obtain all the docs that have v4 : myvalue and another condition that has to be satisfied is that this must happen inside a v1 dictionary and whatever number of intermediate dictionaries (none, v2, v2 and v3) that is with the three docs written my query should give: second doc, third doc and fourth doc Any hint is appreciated giovanni -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch%2bunsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53B2E682.40700%40crs4.it. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53B3B3BE.3060307%40crs4.it https://groups.google.com/d/msgid/elasticsearch/53B3B3BE.3060307%40crs4.it?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53B3B462.5060405%40crs4.it. For more options, visit https://groups.google.com/d/optout.
Re: help in query
That definitely helped it. Thank you Vineeth Regards giovanni On 07/01/2014 07:19 PM, vineeth mohan wrote: Hello Giovanni , I feel this will help - http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_literal_multi_match_literal_query_2.html#_wildcards_in_field_names Thanks Vineeth On Tue, Jul 1, 2014 at 10:19 PM, surfer sur...@crs4.it mailto:sur...@crs4.it wrote: Hi, I'm indexing something like: first doc= { v4 : myvalue } second doc = { v1 : [ { v4 : myvalue }, { v5 : anothervalue } ] } third doc= { v1 : [ { v2 : [ {v4 : myvalue }] } ] } fourth doc = { v1 : [ { v2 : [ { v3 : [ { v4 : myvalue }] } ] } ] } so nested dictionaries and array of dictionaries. I was wondering if there is a query to obtain all the docs that have v4 : myvalue and another condition that has to be satisfied is that this must happen inside a v1 dictionary and whatever number of intermediate dictionaries (none, v2, v2 and v3) that is with the three docs written my query should give: second doc, third doc and fourth doc Any hint is appreciated giovanni -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch%2bunsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53B2E682.40700%40crs4.it. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53B3B3BE.3060307%40crs4.it. For more options, visit https://groups.google.com/d/optout.
Re: Queries with fields {...} don't return field with dot in their name
Hello Vineeth, the items that are indexed in elasticsearch really contains a field named response.user. _source: { clientip: aaa.bbb..ddd, request: http://.aa/b/c;, request.accept-encoding: gzip, deflate, request.accept-language: de-ch, response.content-type: text/html; charset=UTF-8, response: 200, response.age: 0, response.user: userAAA, @timestamp: 2014-07-01T12:18:51.501+02:00, } I realize there is an ambiguity between a field with a dot in its name and a field of a child document. Should fields with dot in their name be avoided? Benoît Le mardi 1 juillet 2014 19:17:41 UTC+2, vineeth mohan a écrit : Hello Ben , Can you paste a sample feed. Thanks Vineeth On Tue, Jul 1, 2014 at 8:26 PM, benq benoit@gmail.com javascript: wrote: Hi all, I have a query that specify the fields to be returned as described here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html However, it does not return the fields with a dot in their name, like response.user. For example, Ex: { size: 1000, fields: [@timestamp, request, response, response.user, clientip], query: {match_all: {} }, filter: { and: [ { range: { @timestamp: { from: ... ] } } The timestamp, request, response and clientip fields are returned. The response.user is not. Any idea why? Regards, Benoît -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86f48242-6514-4d4b-9809-362e18af1d95%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: limitation of 2,147,483,647 terms per segment index in Lucene
Peter, thanks so much for raising this. This looks aweful! I think we should move this into an issue on [1] (please feel free to create one) IMO we should aim to name the issue in a way to prevent this from happening altogether. Along the lines we should help you to recover but I don't know how tricky it will be. Lets start with the issue!! simon [1] https://github.com/elasticsearch/elasticsearch/issues On Monday, June 30, 2014 5:49:32 PM UTC+2, Peter Portante wrote: Is there a way to recover a segment index of a shard that has exceeded Lucene's 2^31 limit? Thanks, -peter [2014-06-30 10:53:02,187][WARN ][indices.cluster ] [Patriots] [vos][0] failed to start shard org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [vos][0] failed recovery at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.IllegalArgumentException: Too many documents, composite IndexReaders cannot exceed 2147483647 at org.apache.lucene.index.BaseCompositeReader.init(BaseCompositeReader.java:77) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:369) at org.apache.lucene.index.StandardDirectoryReader.init(StandardDirectoryReader.java:43) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:115) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:385) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:112) at org.apache.lucene.search.SearcherManager.init(SearcherManager.java:89) at org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1364) at org.elasticsearch.index.engine.internal.InternalEngine.start(InternalEngine.java:291) at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:709) at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:204) at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132) ... 3 more [2014-06-30 10:53:02,213][WARN ][cluster.action.shard ] [Patriots] [vos][0] sending failed shard for [vos][0], node[bS9Lp_a9QZOjiab23Ztk4A], [P], s[INITIALIZING], indexUUID [a0_HJrlgQq-UNCwL2QiVbg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[vos][0] failed recovery]; nested: IllegalArgumentException[Too many documents, composite IndexReaders cannot exceed 2147483647]; ]] [2014-06-30 10:53:02,213][WARN ][cluster.action.shard ] [Patriots] [vos][0] received shard failed for [vos][0], node[bS9Lp_a9QZOjiab23Ztk4A], [P], s[INITIALIZING], indexUUID [a0_HJrlgQq-UNCwL2QiVbg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[vos][0] failed recovery]; nested: IllegalArgumentException[Too many documents, composite IndexReaders cannot exceed 2147483647]; ]] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01253247-c16b-44cb-aa61-d02bc20f44eb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Min Hard Drive Requirements
Hi all, I'm testing the indexing of 100 million documents, it took about 400GB of the hard drive. Is there a minimum free hard drive space needed for the index to work OK? I'm asking because after we indexed 100 million documents we tested the index and it worked OK, but then when trying to optimize the optimize took days and then the index did not respond. The hard drive had only 10 GB free space so we tried to copy the index to a new hard drive with a bigger free space, but the index is still not functioning. Thank you, Ophir -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
Hello, I try to indexing datetime mysql like this : 2013-05-01 00:00:00 In ES it's represented like this : 2013-05-01T00:00:00.000Z The real problem seems to be when I index this date : -00-00 00:00:00 I have used this mapping : type:date, format:-MM-dd HH:mm:ss||MM/dd/||/MM/dd, index:not_analyzed I have obtained this error : [2014-07-02 10:11:56,503][INFO ][cluster.metadata ] [ik-test2] [_river] update_mapping [source] (dynamic) can not be represented as java.sql.Timestamp java.io.IOException: java.sql.SQLException: Value '7918-00-00 00:00:00 ... can not be represented as java.sql.Timestamp at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920) at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1102) at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:576) at com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6592) at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:6192) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5058) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:590) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:565) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:356) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:257) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:228) ... 3 more [2014-07-02 10:11:56,633][WARN ][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] aborting river [2014-07-02 10:12:01,392][INFO ][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new bulk [1] of [69 items], 1 outstanding bulk requests [2014-07-02 10:12:01,437][INFO ][cluster.metadata ] [ik-test2] [my_index] update_mapping [source] (dynamic) Can you help me, with my problem ? Thank to you in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a572cbaa-5304-480d-9fc1-2e1783c36cea%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Min Hard Drive Requirements
It will work until it's full, but then ES will fall over. Merging does require a certain amount of disk space, usually the same amount as the segment that is being merged as it has to take a copy of the shard to work on. So for a 10GB segment, you'd need at least 10GB free. How many shards do you have for the index, or how many are you trying to optimise (merge) down to? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 2 July 2014 18:13, Ophir Michaeli ophirmicha...@gmail.com wrote: Hi all, I'm testing the indexing of 100 million documents, it took about 400GB of the hard drive. Is there a minimum free hard drive space needed for the index to work OK? I'm asking because after we indexed 100 million documents we tested the index and it worked OK, but then when trying to optimize the optimize took days and then the index did not respond. The hard drive had only 10 GB free space so we tried to copy the index to a new hard drive with a bigger free space, but the index is still not functioning. Thank you, Ophir -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aptU73HX9C3TtkcXULpG6YYF6TzZ7jUZOY8mE6grx8yw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
One date field mapping for two different locale
Here's the problem. I have data with date field that can be either in english or german date format (or rather week and month naming convention). I.e.Mittwoch, 18. Juni 2012 or Wednesday, 18. June 2012 I can set up separate mappings for separate fileds for each nation's date. { website : { properties : { date_en : { type : date, format : EEE, dd. MMM , locale : US } } } } { website : { properties : { date_de : { type : date, format : EEE, dd. MMM , locale : DE } } } } And this work properly, untill I will try to put engilsh date into german date field and backwards. I do not have option to receive some additional data with date's locale information. What I need is the option to recognize each type of date, save it internally in timestamp (for sorting) and do that in one and the same field (because of sorting and field naming convention). Something like this. { website : { properties : { date : { type : date, format : EEE, dd. MMM , locale : US||DE } } } } What I want to achive is be able to sort all documents with different country dates by date. Maybe there can be different approach. I'll gladly try other solutions. -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/One-date-field-mapping-for-two-different-locale-tp4059118.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1404289689171-4059118.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Shards considered in write consistency
Hey, I have a question related to write consistency. I have a elasticsearch cluster with 2 nodes. The nodes are configured as number_of_shards = 5 number_of_replicas = 1 If i set the action.write_consistency value as `quorum`, what is the number of active shards required to satisfy the quorum? Please shed some light into the matter. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0029734f-7e86-47b5-8bf1-f0ebe6b6ad2a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
As made, when I index date -00-00 00:00:00 the indexing stop completly with an error. (the begin work and stop instantly) I have tried to put (mapping) the type : string to my date but it doesn't work Have you an idea to solve my problem ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
Hi Tanguy , How is this a valid date string - java.io.IOException: java.sql.SQLException: Value '7918-00-00 00:00:00 ? This value cant be mapped to any date format or is valid in anyway. Thanks Vineth On Wed, Jul 2, 2014 at 3:21 PM, Tanguy Bernard bernardtanguy1...@gmail.com wrote: As made, when I index date -00-00 00:00:00 the indexing stop completly with an error. (the begin work and stop instantly) I have tried to put (mapping) the type : string to my date but it doesn't work Have you an idea to solve my problem ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kfiLtGL6xu5LQKy9Hc1e10OhJsimHqm0Qu2XYhK%3DALag%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
What you can do is to set the mapping for the date field to have: { type: date, format: -MM-dd HH:mm:ss, ignore_malformed: true } then it will just ignore those invalid dates rather than throwing an error -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSnThkYccGcfTCQYQ%3DJiyiQ_jS1Aq8Tmu9_4x%2Bm_XVQTg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
In my mysql table (type : datetime) : | date_source | +-+ | 2008-09-15 18:29:07 | | 2013-08-29 00:00:00 | | 2013-07-04 00:00:00 | | 2013-07-17 00:00:00 | | 2013-07-17 00:00:00 | | -00-00 00:00:00 | ... If I use a mapping (type :string) And I index : PUT /_river/test/_meta { type : jdbc, jdbc : { url : jdbc:mysql://ip:port/database, user : user, password : password, sql : select id_source as _id, title_source, date_source from source, *// if I add this where date_source not like '%%', it's work but values miss for this date* index : test, type : source, max_bulk_requests : 5 }} Le mercredi 2 juillet 2014 12:09:58 UTC+2, vineeth mohan a écrit : Hi Tanguy , How is this a valid date string - java.io.IOException: java.sql.SQLException: Value '7918-00-00 00:00:00 ? This value cant be mapped to any date format or is valid in anyway. Thanks Vineth On Wed, Jul 2, 2014 at 3:21 PM, Tanguy Bernard bernardt...@gmail.com javascript: wrote: As made, when I index date -00-00 00:00:00 the indexing stop completly with an error. (the begin work and stop instantly) I have tried to put (mapping) the type : string to my date but it doesn't work Have you an idea to solve my problem ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1fc6c18b-a192-4972-92b6-9210be9c46aa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Index missing error Eelasticseach java
Hi, I am new to elasticsearch. I am using JAVA Api to establish connection with ES. public void createIndex(final String index) { getClient().admin().indices().prepareCreate(index).execute().actionGet(); } public void createLocalCluster(final String clusterName) { NodeBuilder builder = NodeBuilder.nodeBuilder(); Settings settings = ImmutableSettings.settingsBuilder() .put(gateway.type, none) .put(cluster.name, clusterName) .build(); builder.settings(settings).local(false).data(true); this.node = builder.node(); this.client = node.client(); } public boolean existsIndex(final String index) { IndicesExistsResponse response = getClient().admin().indices().prepareExists(index).execute().actionGet(); return response.isExists(); } public void openIndex(String name){ getClient().admin().indices().prepareOpen(name).execute().actionGet(); } createLocalCluster(cerES); createIndex(news); System.out.println(existsIndex(news)); When i execute the above java code iam getting true response. But when i close the java program and start the program again with the following code: openIndex(news); It is throwing IndexMissingException.But i can see the news index in Data folder of eclipse. So how i retreive data from the node previously?. Is it lost? or am i wrong somewhere? -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Index-missing-error-Eelasticseach-java-tp4059080.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1404254107251-4059080.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Dealing with spam in this forum
I've received in my mailbox at least 49 spams just for the 06/30. I won't call this a few spam email. I'm subscribed for years on many mailing lists, and I'm pretty sure that it would take years to get as much spam on those lists as I get in 1 day on ES mailing list. That's interesting... I'd only seen three spam emails, so I wondered where you got 49 from. I read the posts from my gmail account, so then I checked my spam folder and sure enough there were a lot more emails in there that I was unaware of. I'm going to disable my spam filter for this group so that I get more visibility, and I'd ask other moderators to do the same. Let's see how it goes for a while longer. We can always revisit this decision later on. clint -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e790230-b81f-416b-b7b8-ef1589fb399c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
What this date is supposed to represent? month = 0 or day = 0 does not exist, right? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 2 juillet 2014 à 12:29:29, Tanguy Bernard (bernardtanguy1...@gmail.com) a écrit: In my mysql table (type : datetime) : | date_source | +-+ | 2008-09-15 18:29:07 | | 2013-08-29 00:00:00 | | 2013-07-04 00:00:00 | | 2013-07-17 00:00:00 | | 2013-07-17 00:00:00 | | -00-00 00:00:00 | ... If I use a mapping (type :string) And I index : PUT /_river/test/_meta { type : jdbc, jdbc : { url : jdbc:mysql://ip:port/database, user : user, password : password, sql : select id_source as _id, title_source, date_source from source, // if I add this where date_source not like '%%', it's work but values miss for this date index : test, type : source, max_bulk_requests : 5 }} Le mercredi 2 juillet 2014 12:09:58 UTC+2, vineeth mohan a écrit : Hi Tanguy , How is this a valid date string - java.io.IOException: java.sql.SQLException: Value '7918-00-00 00:00:00 ? This value cant be mapped to any date format or is valid in anyway. Thanks Vineth On Wed, Jul 2, 2014 at 3:21 PM, Tanguy Bernard bernardt...@gmail.com wrote: As made, when I index date -00-00 00:00:00 the indexing stop completly with an error. (the begin work and stop instantly) I have tried to put (mapping) the type : string to my date but it doesn't work Have you an idea to solve my problem ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1fc6c18b-a192-4972-92b6-9210be9c46aa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53b3e1b3.431bd7b7.f8fb%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.
Re: Shards considered in write consistency
In your case quorum means that you need all primaries to be allocated. Which is the case here. Doc explains that very well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency Have a look in detail at: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_how_primary_and_replica_shards_interact.html#_how_primary_and_replica_shards_interact and http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html HTH -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 2 juillet 2014 à 11:40:19, Varun Vasan V (varunvasa...@gmail.com) a écrit: Hey, I have a question related to write consistency. I have a elasticsearch cluster with 2 nodes. The nodes are configured as number_of_shards = 5 number_of_replicas = 1 If i set the action.write_consistency value as `quorum`, what is the number of active shards required to satisfy the quorum? Please shed some light into the matter. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0029734f-7e86-47b5-8bf1-f0ebe6b6ad2a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53b3e296.257130a3.f8fb%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.
have we a way to use highlight and fuzzy together ?
Hello Everything is on subject I have to use fuzzy for my fileds (title,content) and when I'm searching I want to see a part of the sentance where my keyword is. This, together, doesn't work: $params['body']['highlight']['fields'][$value]['fragment_size']=30; $params['body']['query']['fuzzy']=0.2; Have we a way to use highlight and fuzzy together or an other way equivalent ? Thank to you in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6628cb4-7e2c-4e21-b578-a14865450a83%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
This date is created when a document is created, but an error occur and I have this -00-00 ^^ I'm in company while exist since 10 years, the database is old and they are this kind of error. For the moment, I will use : sql : select id_source as _id, title_source, date_source from source, *// if I add this where date_source not like '%%', it's work but values miss for this date* Or not index date_source. My goal was to sort my result with date_source. Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit : What this date is supposed to represent? month = 0 or day = 0 does not exist, right? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
I would recommend updating the SQL database! :) So may be update all dates where date is -00-00 to 1970-01-01 if it fits with your use case. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 2 juillet 2014 à 12:54:36, Tanguy Bernard (bernardtanguy1...@gmail.com) a écrit: This date is created when a document is created, but an error occur and I have this -00-00 ^^ I'm in company while exist since 10 years, the database is old and they are this kind of error. For the moment, I will use : sql : select id_source as _id, title_source, date_source from source, // if I add this where date_source not like '%%', it's work but values miss for this date Or not index date_source. My goal was to sort my result with date_source. Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit : What this date is supposed to represent? month = 0 or day = 0 does not exist, right? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53b3e561.628c895d.f8fb%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.
[ANN] ElasticUI AngularJS Directives - Easily Build an Interface on top of Elasticsearch
Hi all, I just open sourced a set of AngularJS Directives for Elasticsearch. It enables developers to rapidly build a frontend (e.g.: faceted search engine) on top of Elasticsearch. http://www.elasticui.com (or github https://github.com/YousefED/ElasticUI) It makes creating an aggregation and listing the buckets as simple as: *ul eui-aggregation=ejs.TermsAggregation('text_agg').field('text').size(10)* *li ng-repeat=bucket in aggResult.buckets{{bucket}}/li* */ul* I think this was currently missing in the ecosystem, which is why I decided to build and open source it. I'd love any kind of feedback. - Yousef *-* Another example; add a checkbox facet based on a field using one of the built-in widgets https://github.com/YousefED/ElasticUI/blob/master/docs/widgets.md: *eui-checklist field='facet_field' size=10/eui-checklist* Resulting in [image: checklist screenshot] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7677d4fb-b340-4957-a7e6-ef4ef5e8347e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
Ah. Cheers. I had looked at that page a few times but missed that. On Tuesday, 1 July 2014 19:04:56 UTC+1, Glen Smith wrote: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote: Thanks. So default_index and default_search have special meaning. Is this in the docs anywhere? -N On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote: Totally. For example: analyzer: { default_index: { tokenizer: standard, filter: [standard, lowercase] }, default_search: { tokenizer: standard, filter: [standard, lowercase, stop] }, On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: You can set up an analyser for your index... ... my-index: { analysis: { analyzer: { default_index: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, default_search: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, custom_index: { tokenizer: whitespace, filter: [lower] }, custom_search: { tokenizer: whitespace, filter: [lower] } } } } ... and then map your relevant field accordingly: { _timestamp: { enabled: true, store: yes }, properties: { my_field: { type: string, index_analyzer: custom_index, search_analyzer: custom_search } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match 99.97%. The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6796a0dc-5eaa-4db4-ab47-400215743c61%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS
Yes, it's just some date. I think that it can be update quickly. It's the better way :) Thank you all. Le mercredi 2 juillet 2014 12:56:59 UTC+2, David Pilato a écrit : I would recommend updating the SQL database! :) So may be update all dates where date is -00-00 to 1970-01-01 if it fits with your use case. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 2 juillet 2014 à 12:54:36, Tanguy Bernard (bernardt...@gmail.com javascript:) a écrit: This date is created when a document is created, but an error occur and I have this -00-00 ^^ I'm in company while exist since 10 years, the database is old and they are this kind of error. For the moment, I will use : sql : select id_source as _id, title_source, date_source from source, *// if I add this where date_source not like '%%', it's work but values miss for this date* Or not index date_source. My goal was to sort my result with date_source. Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit : What this date is supposed to represent? month = 0 or day = 0 does not exist, right? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bb21d146-0094-4765-baf5-9232977ad4e8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [ANN] ElasticUI AngularJS Directives - Easily Build an Interface on top of Elasticsearch
Very cool, I'll pass this onto some of our devs :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 2 July 2014 20:56, Yousef El-Dardiry yousefdard...@gmail.com wrote: Hi all, I just open sourced a set of AngularJS Directives for Elasticsearch. It enables developers to rapidly build a frontend (e.g.: faceted search engine) on top of Elasticsearch. http://www.elasticui.com (or github https://github.com/YousefED/ElasticUI) It makes creating an aggregation and listing the buckets as simple as: *ul eui-aggregation=ejs.TermsAggregation('text_agg').field('text').size(10)* *li ng-repeat=bucket in aggResult.buckets{{bucket}}/li* */ul* I think this was currently missing in the ecosystem, which is why I decided to build and open source it. I'd love any kind of feedback. - Yousef *-* Another example; add a checkbox facet based on a field using one of the built-in widgets https://github.com/YousefED/ElasticUI/blob/master/docs/widgets.md: *eui-checklist field='facet_field' size=10/eui-checklist* Resulting in [image: checklist screenshot] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7677d4fb-b340-4957-a7e6-ef4ef5e8347e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7677d4fb-b340-4957-a7e6-ef4ef5e8347e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aT-TAiuMMjzmdy1ACDM-h5yGut2T0Z09LGaEcFu-3CXg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
spellcheck and completion suggester or what?
Hi group, I have a special problem which I'm trying to solve. I need search suggestions while typing text into a search box. I tried different settings and options with ES, including term suggester, completion suggester and so on, but no success. What I'm looking for is if I type in a search and I have already typed dan that I get suggestions like: { - responseHeader: { - status: 0, - QTime: 11 }, - spellcheck: { - suggestions: [ - [ - dan, - { - numFound: 9, - startOffset: 0, - endOffset: 3, - suggestion: [ - dana, - danach, - danckert, - dando, - danger, - dangos, - danguolė, - daniel, - danish ] } ] ] } } So just the suggestions in alphabetical order from the index. The above example is from SOLR but I need this feature for ES. Any idea how to achive this? Regards, Bernd -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3cd3b917-7680-416c-b92e-e0ee0a91a601%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Something I am finding difficult, using Aggregations
Having used elastic aggregations for a little bit (and having used Mongo aggregations previously), I have been finding a couple of things a bit difficult/awkward. I am not sure if its because I don't know how to do it properly - or we missing a feature/enhancement in elastic. A common thing I want to do is aggregate on field x, but in the result, I also want field y z (which are unique for a given x) - there doesn't seem to be an easy way to do that. Lets say I have some data: { id : 94538ef6-2998-4ddd-be00-1f5dc2654955, quantity : 1234567.2342, commodityId : 0e918fb8-6572-4663-a692-cbebe8aca7f2, commodityName : Lead, ownerId : 53e0f816-8a0a-4659-b868-c48035676b25, ownerName : Simon Chan, locationId : 1cdd4bc7-76d9-43fb-ac56-8f555164211a, locationName : Shenyang - Shenyang Dongbei, locationCode : W33, locationCity : Shenyang, locationCountry : China } Lets say I want to do a (term) aggregation on ownerId (because its unique, while ownerName obviously is not) I will get results where the bucket key is the id. However, what I want to display to the user is the ownerName - not the id. Looking up the name from the id could be very expensive - but its also unnecessary because the name will be unique for a given bucket - we have the info to hand in the index. The same issue if I want to aggregate by locationId, or commodityId. We dereference the data associated with an id, so that we can search on them - but also we want to use this information to create a label for a bucket when we aggregate. Is there a simple way to retrieve ownerName while aggregating on ownerId? The only way I know to do this is to: a) make sure owner name is not_analyzed and b) do a term subaggregation - which will give only 1 result. Is there an easier way I have missed? (FWIW doing the same thing in, say, a Mongo aggregation is simply a matter of adding the ownerName as a key field - since its unique for a given id, it wont change the aggregation results - the ownerName info is simply extracted from the key data in the result). Cheers, M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cfcf8e74-06e7-4bf3-8cca-311dd14ccbe2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Realtime search + fast indexing
One thing you can consider is calling refresh() after indexing - which has the effect I think you are looking for. There are probably some performance considerations others here can comment on better than I. In any case, calling refresh() is what we do. On Thursday, 26 June 2014 10:25:12 UTC+1, Nico Krijnen wrote: Hi, We have recently migrated our application from 'bare Lucene + Zoie for realtime search' to Elastic Search. Elastic search is awesome and next to scalability, it gives us lots of additional features. The one thing we really miss though is realtime search. Search is the core of our application. All our data is stored in the index (primary data store). When a user adds a file or makes a change, their subsequent search must reflect that change. With Zoie, the data was indexed very quickly into a temporary Lucene memory index. Not having to write+read it on disk makes the documents available for search much faster than NRT Lucene. The memory index is flushed to disk asynchrounously from time to time, not impacting indexing or search performance. Zoie also allows you to wait for a specific 'version of the index' to be available for searching. That way we could make the user's thread wait until their data was indexed in memory, only pausing the thread of that user without having any performance impact for all the other users. Result: realtime search and insanely fast indexing. With Elastic Search we have to do a refresh to make data available for search. Lots of refreshes or the 1 second refresh interval will cause significant slower indexing speed. We don't know beforehand when our users will import documents or make lots of changes, so we cannot really increase the refresh interval when needed to make indexing faster. We know that 'get' is realtime and we make use of that as much as possible, but in lots of cases we really require a search to find the data. Our plan is to implement some mechanism in Elastic Search to get the same realtime search + fast indexing behavior that we had with Zoie. We need some pointers though on what would be the best place in Elastic Search to do something like this. After all it hooks into low level Elastic Search and Lucene stuff. I can imagine that 'realtime-search while indexing' is important for many other Elastic Search users too. What are the chances of something like this getting merged back into the main branch? I'm planning to be at the Friday drinks tomorrow in Amsterdam. Is there anyone attending with whom I could do some sparring with on this matter? Thanks, Nico -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ead036cf-7ddc-4006-8361-8d4a0f77c7c9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)
Hi, I am trying to use cluster.routing.allocation.enable to speed up node restarts. As I understand it, if I set cluster.routing.allocation.enable to none, restart a node, then set cluster.routing.allocation.enable to all, the shards that go UNASSIGNED when the node goes down should start back up on the same node they were assigned to previously. But in practice when I do this, the shards get assigned across the entire cluster when I set cluster.routing.allocation.enable back to all, and then after that, some amount of rebalancing happens. How can I avoid this, and make shards on a restarted node come back on the same node? To be clear, here's exactly the sequence of events: 1) curl -XPUT -s $host:$port/_cluster/settings?pretty=1 -d '{persistent:{cluster.routing.allocation.enable: none}}' 2) service elasticsearch stop on one node of a 3 node cluster (discovery.zen.minimum_master_nodes: 2) 3) shards that were assigned to the now stopped node show as UNASSIGNED 4) service elasticsearch start on the same node as in (2) 5) wait a few minutes - shards mentioned in (3) still show as UNASSIGNED, each node sees the full cluster (/_cat/nodes) 6) curl -XPUT -s $host:$port/_cluster/settings?pretty=1 -d '{persistent:{cluster.routing.allocation.enable: all}}' 7) UNASSIGNED shards mentioned in (3) begin being assigned across all nodes in the cluster 8) After all UNASSIGNED nodes are assigned, some start rebalancing (migrating to other nodes) 9) Cluster is happy The amount of data in this cluster is very large, and this process can take close to 24 hours. So I'd like very much to avoid that for routine restarts. Thanks. Andy -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34bb65f7-a286-46f7-a9a1-5f4e72f06926%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Dealing with spam in this forum
I fall on the side of caring less about spam emails (since I have decent spam filter on my email) and would rate easy access to the group much higher. I tend to add/remove myself from groups all the time - so adding a delay to adding myself to a group with be a big PITA for me. -M On Wednesday, 2 July 2014 11:34:05 UTC+1, Clinton Gormley wrote: I've received in my mailbox at least 49 spams just for the 06/30. I won't call this a few spam email. I'm subscribed for years on many mailing lists, and I'm pretty sure that it would take years to get as much spam on those lists as I get in 1 day on ES mailing list. That's interesting... I'd only seen three spam emails, so I wondered where you got 49 from. I read the posts from my gmail account, so then I checked my spam folder and sure enough there were a lot more emails in there that I was unaware of. I'm going to disable my spam filter for this group so that I get more visibility, and I'd ask other moderators to do the same. Let's see how it goes for a while longer. We can always revisit this decision later on. clint -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c6d4d20-3bab-4063-997d-b3407355c955%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Looking to build a logging solution with threshold alerting.
I am looking to build a logging solution and wanted to make sure that I am not missing any key components. The logs that I have are currently stored in a database which there is limited access due to locking risks from bad queries. My plan is to have the dba's write the logs from the database tables to a file on a set interval then have logstash pick up the logs and write it to elastic search. Then for viewing/searching the logs I will be using kibana. Everything up to this point I have been able to make a proof of concept for but the other request was to have alerting. I have spent some time looking at this and the general response seems to be to use percolation, but that seems to only make sense if you want to send an alert if you receive a single error that matches a query and from what I have seen there is no way to a threshold alerting system using percolation. My thought to solve the threshold alerting is to create a simple web UI that allows the user to enter in a query to search for, a threshold, a time frame, and emails to send the alert to that would get stored in elastic search. Then an app (Running as a windows service or cron job) that pulls the alerts and then runs the queries and checks the time-frame and threshold (Would run on some interval). If the count surpasses the threshold then it would send an email to values stored in the email addresses. I know that SPM seems to cover this and move but we are currently looking to see if we can do this without buying another product. Is this the correct approach to take or should I be looking at doing something else? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce1cb3cc-e974-4b3b-8568-a2afaaae6c00%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)
Patrick, * Well, I did answer your question. But probably not from the direction you expected. hmm no, you didn't. My question was: it looks like I cant retrieve/display [_all fields] content. Any idea? and you replied with your logstash template where _all is disabled. I'm interested in disabling _all, but that was not my question at this point.* Fair enough. I don't know the inner details; I am just an enthusiastic end user. To the best of my knowledge, there is no content for the _all field; I view this as an Elasticsearch psuedo field whose name is _all and whose index terms are taken from all fields (by default), but still there is no actual content for it. And after I got into the habit of disabling the _all field, my hands-on exploration of its nuances have ended. It's time for the experts to explain! *Your answer to my second message, below, is informative and interesting but fails to answer my second question too. I simply asked whether I need to feed the complete modified mapping of my template or if I can just push the modified part (ie. the _all:{enabled: false} part). * Again, I have never done this, so I can only tell you what I do. I just cannot tell you all the nuances of what Elasticsearch is capable of. My recommendation is to try it. Elasticsearch is great at letting you experiment and then telling you clearly if your attempt succeeds or fails. So, try your scenario. If it fails, then it didn't work or you did something wrong. If it succeeds, then you can see exactly what Elasticsearch actually accepted as your mapping. For example: curl 'http://localhost:9200/logstash-2014.06.30/_mapping?pretty=true' echo This particular query looks at one of my logstash-generated indices, and it lets me verify that Elasticsearch and Logstash conspired to create the mappings I expected. I used this command quite a bit until I finally got everything configured correctly. (I actually verify the mapping via Elasticsearch Head, but under the covers it's the same command.) Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8eaefd0e-f684-4f44-9fcb-3137812a99d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Problem accessing fields from a native script during percolation
Hi all, We're trying to figure out how to access fields from within a native AbstractSearchScript when it's called from a percolate request that contains the document to percolate. We tried to use source mechanism and stored fields to no avail (no errors, but no matches). The same scripts are working fine for classic searches. This was tried with Elasticsearch release 1.1.1 and a snapshot of 2.0.0. We're running out of ideas, any help would be really appreciated Thanks! curl -XPOST http://localhost:9200/index1 curl -XPOST http://localhost:9200/index1/mytype/_mapping -d '{ properties: { source_field: { type: string }, stored_field: { type: string, stored: true } } }' curl -XPUT http://localhost:9200/index1/.percolator/1; -d '{ query: { filtered: { query : { match_all : {}}, filter: { script: { script: cooccurenceScript, params: { map: { list : [ a ] } }, lang: native } } } } }' curl -XPUT http://localhost:9200/index1/.percolator/2; -d '{ query: { filtered: { query : { match_all : {}}, filter: { script: { script: cooccurenceStoredScript, params: { map: { list : [ a ] } }, lang: native } } } } }' Native scripts: package test; import java.util.Map; import org.elasticsearch.common.Nullable; import org.elasticsearch.common.component.AbstractComponent; import org.elasticsearch.common.inject.Inject; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.node.Node; import org.elasticsearch.script.ExecutableScript; import org.elasticsearch.script.NativeScriptFactory; public class CooccurenceScriptFactory extends AbstractComponent implements NativeScriptFactory{ private final Node node; @SuppressWarnings(unchecked) @Inject public CooccurenceScriptFactory(Node node, Settings settings) { super(settings); this.node = node; } @Override public ExecutableScript newScript (@Nullable MapString,Object params){ return new CooccurenceScript(node.client(), logger, params); } } package test; import org.elasticsearch.ElasticsearchIllegalArgumentException; import org.elasticsearch.client.Client; import org.elasticsearch.common.Nullable; import org.elasticsearch.common.logging.ESLogger; import org.elasticsearch.common.xcontent.support.XContentMapValues; import org.elasticsearch.script.AbstractSearchScript; import org.elasticsearch.search.lookup.SourceLookup; import java.util.List; import java.util.Map; public class CooccurenceScript extends AbstractSearchScript { private ListString list = null; @SuppressWarnings(unchecked) public CooccurenceScript(Client client, ESLogger logger, @Nullable MapString,Object params) { MapString, Object map = params == null ? null : XContentMapValues.nodeMapValue(params.get(map), null); if (map == null) { throw new ElasticsearchIllegalArgumentException(Missing the map parameter); } list = (ListString) map.get(list); if (list == null || list.isEmpty()) { throw new ElasticsearchIllegalArgumentException(Missing the list parameter or list is empty); } } @Override public java.lang.Object run() { SourceLookup source = source(); @SuppressWarnings(unchecked) ListObject values = (ListObject) source.get(source_field); if (values == null || values.isEmpty()) { return false; } for (Object localValue : values) { boolean result = true; for (String s : list) { result = ((String) localValue).contains(s); } if (result) { return true; } } return false; } } package test; import org.elasticsearch.common.Nullable; import org.elasticsearch.common.component.AbstractComponent; import org.elasticsearch.common.inject.Inject; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.node.Node; import org.elasticsearch.script.ExecutableScript; import org.elasticsearch.script.NativeScriptFactory; import java.util.Map; public class CooccurenceStoredScriptFactory extends AbstractComponent implements NativeScriptFactory{ private final Node node; @SuppressWarnings(unchecked) @Inject public CooccurenceStoredScriptFactory(Node node, Settings settings) { super(settings); this.node = node; } @Override public ExecutableScript newScript (@Nullable MapString,Object params){ return new CooccurenceStoredScript(node.client(), logger, params); } } package test; import org.elasticsearch.ElasticsearchIllegalArgumentException; import org.elasticsearch.client.Client; import org.elasticsearch.common.Nullable; import org.elasticsearch.common.logging.ESLogger; import org.elasticsearch.common.xcontent.support.XContentMapValues; import
Re: have we a way to use highlight and fuzzy together ?
On Wed, Jul 2, 2014 at 6:47 AM, Tanguy Bernard bernardtanguy1...@gmail.com wrote: Hello Everything is on subject I have to use fuzzy for my fileds (title,content) and when I'm searching I want to see a part of the sentance where my keyword is. This, together, doesn't work: $params['body']['highlight']['fields'][$value]['fragment_size']=30; $params['body']['query']['fuzzy']=0.2; Have we a way to use highlight and fuzzy together or an other way equivalent ? Usually its better to show a recreation with curl. PHP isn't always understood. Vocabulary point: fuzzy, prefix, and regex queries are called multi term queries. Anyway, there are three highlighters built in to Elasticsearch all of which have different feature sets. I'm not sure if the plain highlighter supports multi term queries, but you can try the fast vector highlighter or the postings highlighter which do support multi term queries. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html For completeness sake I should mention that I maintain a fourth highlighter that also supports multi term queries but it is a plugin: https://github.com/wikimedia/search-highlighter Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd02ok9w%2B9U47bdmT78JvxP%3D41kpPBpOyRCGmXH71J4a5Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Dealing with spam in this forum
The behavior in my gmail-operated spam filter has been to toss out lots of emails from this list as false positives. So, I keep sending them back to my in box; pretty soon, gmail asks me to forward the good ones to them to study, so I do. The result of that is that they catch NONE of those spams. They also don't put enough information in the header to allow me to see if all those spams come from the same IP address. Otherwise, it might be possible for the group list to block certain IP addresses. On Wed, Jul 2, 2014 at 3:34 AM, Clinton Gormley cl...@traveljury.com wrote: I've received in my mailbox at least 49 spams just for the 06/30. I won't call this a few spam email. I'm subscribed for years on many mailing lists, and I'm pretty sure that it would take years to get as much spam on those lists as I get in 1 day on ES mailing list. That's interesting... I'd only seen three spam emails, so I wondered where you got 49 from. I read the posts from my gmail account, so then I checked my spam folder and sure enough there were a lot more emails in there that I was unaware of. I'm going to disable my spam filter for this group so that I get more visibility, and I'd ask other moderators to do the same. Let's see how it goes for a while longer. We can always revisit this decision later on. clint -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e790230-b81f-416b-b7b8-ef1589fb399c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH6s0fzpS%2BPdFjLq5-zuCoDJxTXyzrZ3mxOikrYpCh5erwMJAA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Index missing error Eelasticseach java
Use gateway type local instead of none, then your index persists across cluster restarts. Jörg On Wed, Jul 2, 2014 at 12:35 AM, venuchitta venu.chitta1...@gmail.com wrote: Hi, I am new to elasticsearch. I am using JAVA Api to establish connection with ES. public void createIndex(final String index) { getClient().admin().indices().prepareCreate(index).execute().actionGet(); } public void createLocalCluster(final String clusterName) { NodeBuilder builder = NodeBuilder.nodeBuilder(); Settings settings = ImmutableSettings.settingsBuilder() .put(gateway.type, none) .put(cluster.name, clusterName) .build(); builder.settings(settings).local(false).data(true); this.node = builder.node(); this.client = node.client(); } public boolean existsIndex(final String index) { IndicesExistsResponse response = getClient().admin().indices().prepareExists(index).execute().actionGet(); return response.isExists(); } public void openIndex(String name){ getClient().admin().indices().prepareOpen(name).execute().actionGet(); } createLocalCluster(cerES); createIndex(news); System.out.println(existsIndex(news)); When i execute the above java code iam getting true response. But when i close the java program and start the program again with the following code: openIndex(news); It is throwing IndexMissingException.But i can see the news index in Data folder of eclipse. So how i retreive data from the node previously?. Is it lost? or am i wrong somewhere? -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Index-missing-error-Eelasticseach-java-tp4059080.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1404254107251-4059080.post%40n3.nabble.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGd%2BG%3DEhY2LHeH27ujsw0%2B4_%3DpPZoW3qkd8Grxe416Wdw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Inter-document Queries
Together with Zennet we brainstormed a solution building on top of Itamar's proposal. In one string field we append the current path to the all previous ones and since we are talking about funnels we need to store them only on the last event/document generated, e.g SessionEndedEvent. Then we can use regex pattern matching to identify if the sequence of steps can be found anywhere in the stored paths string. This solution appears to be extremely fast. On Wednesday, June 11, 2014 1:14:59 AM UTC+3, Zennet Wheatcroft wrote: I simplified the actual problem in order to avoid explaining the domain specific details. Allow me to add back more detail. We want to be able to search for multiple points of user action, towards a conversion funnel, and condition on multiple fields. Let's add another field (response) to the above model: {.., path:/promo/A, response: 200, ..} {.., path:/page/1, response: 401, ..} {.., path:/promo/D,response: 200, ..} {.., path:/page/23, response: 301, ..} {.., path:/page/2, response: 418, ..} Let's say we define three points through the conversion funnel: A: Visited path=/page/1 B: Got response=401 from some path C: Exited at path=/sale/C And we want to know how many users did steps A-B-C in that order. If we add an array prev_response like we did for prev_path, then we can use a term filter to find documents with term path=/sale/C and prev_path=/page/1 and prev_response=401. But this will not distinguish between A-B-C and B-A-C. Perhaps I could use the script filter for the last mile and from the term filtered results throw out B-A-C and it will run more quickly because of the reduced document set. Is there another way to implement this query? Zennet On Wednesday, June 4, 2014 5:01:19 PM UTC-7, Itamar Syn-Hershko wrote: You need to be able to form buckets that can be reduced again, either using the aggregations framework or a query. One model that will allow you to do that is something like this: { userid: xyz, path:/sale/B, previous_paths:[...], tstamp:..., ... } So whenever you add a new path, you denormalize and add previous paths that could be relevant. This might bloat your storage a bit and be slower on writes, but it is very optimized for reads since now you can do an aggregation that queries for the desired path and buckets on the user. To check the condition of the previous path you should be able to bucket again using a script, or maybe even with a query on a nested type. This is just from the top of my head but should definitely work if you can get to that model -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 5, 2014 at 2:36 AM, Zennet Wheatcroft zwhea...@atypon.com wrote: Yes. I can re-index the data or transform it in any way to make this query efficient. What would you suggest? On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote: This model is not efficient for this type of querying. You cannot do this in one query using this model, and the pre-processing work you do now + traversing all documents is very costly. Is it possible for you to index the data (even as a projection) into Elasticsearch using a different model, so you can use ES properly using queries or the aggregations framework? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft zwhea...@atypon.com wrote: Hi, I am looking for an efficient way to do inter-document queries in Elasticsearch. Specifically, I want to count the number of users that went through an exit point B after visiting point A. In general terms, say we have some event log data about users actions on a website: {userid:xyz, machineid:110530745, path:/promo/A, country :US, tstamp:2013-04-01 00:01:01} {userid:pdq, machineid:110519774, path:/page/1, country: CN, tstamp:2013-04-01 00:02:11} {userid:xyz, machineid:110530745, path:/promo/D, country :US, tstamp:2013-04-01 00:06:31} {userid:abc, machineid:110527022, path:/page/23, country :DE, tstamp:2013-04-01 00:08:00} {userid:pdq, machineid:110519774, path:/page/2, country: CN, tstamp:2013-04-01 00:08:55} {userid:xyz, machineid:110530745, path:/sale/B, country: US, tstamp:2013-04-01 00:09:46} {userid:abc, machineid:110527022 , path:/promo/A, country:DE, tstamp:2013-04-01 00:10:46} And we have 500+M such entries. We want a count of the number of userids that visited path=/sale/B after visiting path=/promo/A. What I did is to preprocess the data, sorting by userid, tstamp, then compacting all events by the same userid into the same document. Then I wrote a script filter which traverses the path array
Re: Memory issues on ES client node
I'm not sure but it looks like a node tries to move some GB of document hits around. This might have triggered timeouts at other places (probably with node disconnects) and maybe the GB chunk is not yet GC collected, so you see this in your heap analyzer tool. It depends on the search results and search hits you generated if the heaviness of the search result is expected or not, so it would be useful to know more about your queries. Jörg On Wed, Jul 2, 2014 at 3:29 AM, Venkat Morampudi venkatmoramp...@gmail.com wrote: Thanks for reply Jörg. I don't have any logs. I will try to enable them it would but it would take some time though. If there anything in particular that we need to enable, please let me know. -VM On Tuesday, July 1, 2014 12:58:21 PM UTC-7, Jörg Prante wrote: Do you have anything in your logs, i.e. many disconnects/reconnects? Jörg On Tue, Jul 1, 2014 at 7:59 PM, Venkat Morampudi venkatm...@gmail.com wrote: In the elastic search deployment we are seeing random client node crashed due to out of memory exception. I got the memory dump from one of the crash and analysed using Eclipse memory analyzer. I have attached leak suspect report. Apparently 242 objects of type org.elasticsearch.action. search.type.TransportSearchQueryThenFetchAction$AsyncAction are holding almost 8gb of memory. I have spent some time on source code but couldn't find anything obvious. I would really appreciate any help with this issue. -VM -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/37881ead-70c2-40d8-89b6-a771b2a36bdd% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/37881ead-70c2-40d8-89b6-a771b2a36bdd%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9930fcfd-d2d4-4f62-b8a0-8f1f989069f2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/9930fcfd-d2d4-4f62-b8a0-8f1f989069f2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE_Xum%2BU%3D-M-X_R93qbDdOKx-QFS2PFCbxcik-uqtpBbw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
ElasticSearch 1.2.1 doesnt run on JDK 1.6?
We have been using older Elasticsearch version here upgrading to 1.2.1 shows use 'unknown class version errors' on JDK 1.6 . Docs says that JDK 1.6 is support (and it was) . Is there some update here? What latest Elasticsearch version is available fo JDK 1.6? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5df4ed1-6dd4-4402-8208-25d35c27ca49%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search
Thanks Mark, Yeah sorry I realized after the post that I should have done pastebin but I couldnt edit my post. Yes I am using the logstash dashboard. I changed the number of pages to a max record size of 10,000 results. I also realized that my query in kibana was only selecting the last days worth of records. So in the end I'm a dumbass. Works now after I change the date for the query. :) Jamie -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/427b5e09-40d5-4183-b3c6-a6a63514768a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch 1.2.1 doesnt run on JDK 1.6?
Docs say at least Java 7 is required from ES 1.2.0 on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html For Java 6, you have to use ES versions 1.2.0 Jörg On Wed, Jul 2, 2014 at 4:21 PM, David Marko dmarko...@gmail.com wrote: We have been using older Elasticsearch version here upgrading to 1.2.1 shows use 'unknown class version errors' on JDK 1.6 . Docs says that JDK 1.6 is support (and it was) . Is there some update here? What latest Elasticsearch version is available fo JDK 1.6? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5df4ed1-6dd4-4402-8208-25d35c27ca49%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a5df4ed1-6dd4-4402-8208-25d35c27ca49%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHCmOPogq7ZCvA%3DcacYUMyE0qCHsxVN4cCk%3DhfE67FpDQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)
On Wed, Jul 02, 2014 at 05:43:26AM -0700, Andrew Davidoff wrote: How can I avoid this, and make shards on a restarted node come back on the same node? Hello, I have exactly the same issue. My objective is to make a rolling restart script which wait for green cluster state before restarting a node. I use: curl -XPUT -s $host:$port/_cluster/settings -d '{transient:{cluster.routing.allocation.enable: new_primaries}}' to allow the cluster to work (and be able to create indices) during restart. But same issue: node is back up but nothing happen until I enable all allocation again I have gone through elasticsearch documentation related to recovery, gateway, cluster settings without finding any parameters to activate or configure this initial recovery of local indices. -- Grégoire -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140702142754.GA2140%40criteo-scalasto.criteo.prod. For more options, visit https://groups.google.com/d/optout.
Re: does snapshot restore lead to a memory leak?
So, your search-only machines are running out of memory, while your index-only machines are doing fine. Did I understand you correctly? Could you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from the machine that runs out of memory, please run stats a few times with 1 hour interval. I would like to see how memory consumption is increasing over time. Please, also run nodes info ones (curl localhost:9200/_nodes) and post here (or send me by email) the results. Thanks! On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote: Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the update. I explain: - we have one cluster of one machine which is only meant for serving search requests. the goal is not to index anything to it. It contains 1.7k indices, give it or take it. - every day, those 1.7k indices are reindexed, and snapshoted in pairs to a S3 repository (producint 850 snapshots)repository. - every day, the one reading only cluster of the first point restores those 850 snapshots to update its 1.7k indices from that same S3 repository. It works like a real charm. Load has dropped dramatically, and we can set a farm of temporary machines to do the indexing duties. But memory consumption never stops growing. we don't get any out of memory error or anything. In fact, there is nothing in the logs that shows any error, but after a week or a few days, the host has its memory almost exhausted and elasticsearch is not responding. The memory consumption is of course way ahead of the HEAP_SIZE We have to restart it and, when we do it we get the following error: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioSelector.registerTask(AbstractNioSelector.java:120) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. execute(DefaultChannelPipeline.java:636) at org.elasticsearch.common.netty.channel.Channels. fireExceptionCaughtLater(Channels.java:496) at org.elasticsearch.common.netty.channel.AbstractChannelSink. exceptionCaught(AbstractChannelSink.java:46) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. notifyHandlerException(DefaultChannelPipeline.java:658) at org.elasticsearch.common.netty.channel. DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream( DefaultChannelPipeline.java:781) at org.elasticsearch.common.netty.channel.Channels.write(Channels. java:725) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:591) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.common.netty.channel.Channels.write(Channels. java:704) at org.elasticsearch.common.netty.channel.Channels.write(Channels. java:671) at org.elasticsearch.common.netty.channel.AbstractChannel.write( AbstractChannel.java:248) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse( NettyHttpChannel.java:158) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:106) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:98) at org.elasticsearch.action.search.type. TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim( TransportSearchQueryAndFetchAction.java:94) at org.elasticsearch.action.search.type. TransportSearchQueryAndFetchAction$AsyncAction.moveToSecondPhase( TransportSearchQueryAndFetchAction.java:77) at org.elasticsearch.action.search.type. TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase( TransportSearchTypeAction.java:425) at org.elasticsearch.action.search.type. TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult( TransportSearchTypeAction.java:243) at org.elasticsearch.action.search.span style=color: # ... -- You received this
geo_polygon filter with non-zero rule filling
Is it possible to apply a geo_polygon filter with a non-zero rule https://en.wikipedia.org/wiki/Nonzero-rule ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/271d6449-1c2e-446c-9e35-4b45198ad381%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Queries with fields {...} don't return field with dot in their name
Hello Ben , This is defenitely an ambiguity. By request.user , in the usual case ES expects a data like request : { user : vm } Try request\.user or something. Some mechanism to escape the dot. Thanks Vineeth On Wed, Jul 2, 2014 at 1:13 PM, benq benoit.quart...@gmail.com wrote: Hello Vineeth, the items that are indexed in elasticsearch really contains a field named response.user. _source: { clientip: aaa.bbb..ddd, request: http://.aa/b/c;, request.accept-encoding: gzip, deflate, request.accept-language: de-ch, response.content-type: text/html; charset=UTF-8, response: 200, response.age: 0, response.user: userAAA, @timestamp: 2014-07-01T12:18:51.501+02:00, } I realize there is an ambiguity between a field with a dot in its name and a field of a child document. Should fields with dot in their name be avoided? Benoît Le mardi 1 juillet 2014 19:17:41 UTC+2, vineeth mohan a écrit : Hello Ben , Can you paste a sample feed. Thanks Vineeth On Tue, Jul 1, 2014 at 8:26 PM, benq benoit@gmail.com wrote: Hi all, I have a query that specify the fields to be returned as described here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/ current/search-request-fields.html However, it does not return the fields with a dot in their name, like response.user. For example, Ex: { size: 1000, fields: [@timestamp, request, response, response.user, clientip], query: {match_all: {} }, filter: { and: [ { range: { @timestamp: { from: ... ] } } The timestamp, request, response and clientip fields are returned. The response.user is not. Any idea why? Regards, Benoît -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86f48242-6514-4d4b-9809-362e18af1d95%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/86f48242-6514-4d4b-9809-362e18af1d95%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5keeKB99M4rh1rKX74%3D0P%3DUqEEZtWVi0sq5-PYwY4WbKg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)
All, This seems apropos to the current discussion and could help clear up some confusion on recommendations etc. We, Elasticsearch, are hosting a Webinar on ELK, given by the Logstash creator, Jordan Sissel. Its today in 40 minutes. http://www.elasticsearch.org/webinars/introduction-elk-stack/ On Wednesday, July 2, 2014 6:08:34 AM UTC-7, Brian wrote: Patrick, * Well, I did answer your question. But probably not from the direction you expected. hmm no, you didn't. My question was: it looks like I cant retrieve/display [_all fields] content. Any idea? and you replied with your logstash template where _all is disabled. I'm interested in disabling _all, but that was not my question at this point.* Fair enough. I don't know the inner details; I am just an enthusiastic end user. To the best of my knowledge, there is no content for the _all field; I view this as an Elasticsearch psuedo field whose name is _all and whose index terms are taken from all fields (by default), but still there is no actual content for it. And after I got into the habit of disabling the _all field, my hands-on exploration of its nuances have ended. It's time for the experts to explain! *Your answer to my second message, below, is informative and interesting but fails to answer my second question too. I simply asked whether I need to feed the complete modified mapping of my template or if I can just push the modified part (ie. the _all:{enabled: false} part). * Again, I have never done this, so I can only tell you what I do. I just cannot tell you all the nuances of what Elasticsearch is capable of. My recommendation is to try it. Elasticsearch is great at letting you experiment and then telling you clearly if your attempt succeeds or fails. So, try your scenario. If it fails, then it didn't work or you did something wrong. If it succeeds, then you can see exactly what Elasticsearch actually accepted as your mapping. For example: curl 'http://localhost:9200/logstash-2014.06.30/_mapping?pretty=true' echo This particular query looks at one of my logstash-generated indices, and it lets me verify that Elasticsearch and Logstash conspired to create the mappings I expected. I used this command quite a bit until I finally got everything configured correctly. (I actually verify the mapping via Elasticsearch Head, but under the covers it's the same command.) Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2dd4206-c8bd-4c96-90df-5ad4a7bce5e1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: does snapshot restore lead to a memory leak?
Igor. Yes, that's right. My index only machines are just machines that are booted just for the indexing-snapshotting task. once there is no more tasks in queue, those machines are terminated. they only handle a few indices each time (their only purpose is to snapshot). I will do as you tell me. I guess I'll better wait to the timeframe in which most of the restores occurs, because that's when the memory consumption grows more, so expect those postings in 5 or 6 hours. On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote: So, your search-only machines are running out of memory, while your index-only machines are doing fine. Did I understand you correctly? Could you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from the machine that runs out of memory, please run stats a few times with 1 hour interval. I would like to see how memory consumption is increasing over time. Please, also run nodes info ones (curl localhost:9200/_nodes) and post here (or send me by email) the results. Thanks! On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote: Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the update. I explain: - we have one cluster of one machine which is only meant for serving search requests. the goal is not to index anything to it. It contains 1.7k indices, give it or take it. - every day, those 1.7k indices are reindexed, and snapshoted in pairs to a S3 repository (producint 850 snapshots)repository. - every day, the one reading only cluster of the first point restores those 850 snapshots to update its 1.7k indices from that same S3 repository. It works like a real charm. Load has dropped dramatically, and we can set a farm of temporary machines to do the indexing duties. But memory consumption never stops growing. we don't get any out of memory error or anything. In fact, there is nothing in the logs that shows any error, but after a week or a few days, the host has its memory almost exhausted and elasticsearch is not responding. The memory consumption is of course way ahead of the HEAP_SIZE We have to restart it and, when we do it we get the following error: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioSelector.registerTask(AbstractNioSelector.java:120) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. execute(DefaultChannelPipeline.java:636) at org.elasticsearch.common.netty.channel.Channels. fireExceptionCaughtLater(Channels.java:496) at org.elasticsearch.common.netty.channel.AbstractChannelSink. exceptionCaught(AbstractChannelSink.java:46) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. notifyHandlerException(DefaultChannelPipeline.java:658) at org.elasticsearch.common.netty.channel. DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream( DefaultChannelPipeline.java:781) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:725) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:591) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:704) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:671) at org.elasticsearch.common.netty.channel.AbstractChannel.write( AbstractChannel.java:248) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse( NettyHttpChannel.java:158) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:106) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:98) at org.elasticsearch.action.search.type. TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim( TransportSearchQueryAndFetchAction.java:94)
Re: does snapshot restore lead to a memory leak?
This memory issue report might be related https://groups.google.com/forum/#!topic/elasticsearch/EH76o1CIeQQ Jörg On Wed, Jul 2, 2014 at 5:34 PM, JoeZ99 jzar...@gmail.com wrote: Igor. Yes, that's right. My index only machines are just machines that are booted just for the indexing-snapshotting task. once there is no more tasks in queue, those machines are terminated. they only handle a few indices each time (their only purpose is to snapshot). I will do as you tell me. I guess I'll better wait to the timeframe in which most of the restores occurs, because that's when the memory consumption grows more, so expect those postings in 5 or 6 hours. On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote: So, your search-only machines are running out of memory, while your index-only machines are doing fine. Did I understand you correctly? Could you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from the machine that runs out of memory, please run stats a few times with 1 hour interval. I would like to see how memory consumption is increasing over time. Please, also run nodes info ones (curl localhost:9200/_nodes) and post here (or send me by email) the results. Thanks! On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote: Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the update. I explain: - we have one cluster of one machine which is only meant for serving search requests. the goal is not to index anything to it. It contains 1.7k indices, give it or take it. - every day, those 1.7k indices are reindexed, and snapshoted in pairs to a S3 repository (producint 850 snapshots)repository. - every day, the one reading only cluster of the first point restores those 850 snapshots to update its 1.7k indices from that same S3 repository. It works like a real charm. Load has dropped dramatically, and we can set a farm of temporary machines to do the indexing duties. But memory consumption never stops growing. we don't get any out of memory error or anything. In fact, there is nothing in the logs that shows any error, but after a week or a few days, the host has its memory almost exhausted and elasticsearch is not responding. The memory consumption is of course way ahead of the HEAP_SIZE We have to restart it and, when we do it we get the following error: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oSelector.registerTask(AbstractNioSelector.java:120) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oWorker.executeInIoThread(AbstractNioWorker.java:72) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oWorker.executeInIoThread(AbstractNioWorker.java:56) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oChannelSink.execute(AbstractNioChannelSink.java:34) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .execute(DefaultChannelPipeline.java:636) at org.elasticsearch.common.netty.channel.Channels.fireExceptio nCaughtLater(Channels.java:496) at org.elasticsearch.common.netty.channel.AbstractChannelSink.e xceptionCaught(AbstractChannelSink.java:46) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .notifyHandlerException(DefaultChannelPipeline.java:658) at org.elasticsearch.common.netty.channel.DefaultChannelPipelin e$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline. java:781) at org.elasticsearch.common.netty.channel.Channels.write(Channe ls.java:725) at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne Encoder.doEncode(OneToOneEncoder.java:71) at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne Encoder.handleDownstream(OneToOneEncoder.java:59) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .sendDownstream(DefaultChannelPipeline.java:591) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.common.netty.channel.Channels.write(Channe ls.java:704) at org.elasticsearch.common.netty.channel.Channels.write(Channe ls.java:671) at org.elasticsearch.common.netty.channel.AbstractChannel.write( AbstractChannel.java:248) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(N ettyHttpChannel.java:158) at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe sponse(RestSearchAction.java:106) at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe sponse(RestSearchAction.java:98)
Re: Min Hard Drive Requirements
When I tried to optimize the index had 51 shards. Regards, Ophir On Wednesday, July 2, 2014 11:27:50 AM UTC+3, Mark Walkom wrote: It will work until it's full, but then ES will fall over. Merging does require a certain amount of disk space, usually the same amount as the segment that is being merged as it has to take a copy of the shard to work on. So for a 10GB segment, you'd need at least 10GB free. How many shards do you have for the index, or how many are you trying to optimise (merge) down to? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 2 July 2014 18:13, Ophir Michaeli ophirm...@gmail.com javascript: wrote: Hi all, I'm testing the indexing of 100 million documents, it took about 400GB of the hard drive. Is there a minimum free hard drive space needed for the index to work OK? I'm asking because after we indexed 100 million documents we tested the index and it worked OK, but then when trying to optimize the optimize took days and then the index did not respond. The hard drive had only 10 GB free space so we tried to copy the index to a new hard drive with a bigger free space, but the index is still not functioning. Thank you, Ophir -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6cdd64fb-dfe7-479f-b44c-3c1e8cff1ce7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Custom Query variables ?
If you enable explanations, you can see why Lucene the rational behind the scoring: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html You are probably correct in that the array length is influencing the scoring. By default, Lucene will rate higher fields with fewer terms by using length normalization. You can disable norms on the field: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#norms You can fine-tune better by learning how to read Lucene's explanations. It is difficult at first, but it is a useful skill. Cheers, Ivan On Tue, Jul 1, 2014 at 1:02 AM, Pierrick Boutruche pboutru...@octo.com wrote: Up ? Any ideas ? Le lundi 30 juin 2014 17:48:54 UTC+2, Pierrick Boutruche a écrit : Hi everyone, I'm creating on my own a little Geocoder. My goal is to be able to retrieve a big city or a country with a string on input. This string can be mistyped, so I indexed geonames cities5000 data (cities 5000 inhab), and crossed theses data with countries admin data. So I got a 46000 cities index with country, admin pop. I created a search_field in which I put country, admin city name + alternate names provided in cities5000 file. I want, within this array, search for a string. Currently, I'm just searching with a MatchQuery, like Paris in search_field. Unfortunately, the first result is Paris... in Canada... Still, the search_field data is this one, for Paris (CA) and Paris (FR): [u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario'] [u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France', u'Paris', u'Paris'] I don't understand why Paris, CA is first, 'cause there's so much more Paris in the second one... Anyway, is there any way to make the number of my_query terms appearance make the difference ? Because with alternate names, there will be so much much more Paris that it has te count. Actually I think the array length matters in the scoring and I don't want it to... I thought of a custom query score, but I don't think I'm able to get the query term in the script query. Any ideas ? Thanks ! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCa9nNPX-7oQgjXq6AsFVUyxarDOq9SQ3w6M2MMgT2rNQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Custom Query variables ?
For geo search, it would be a good approach to respect the searchers preference by using a locale, so I suggest to add a locale fr filter to the search. Or an origin is added to the start query and all cities are ordered by geo distance in relation to the origin. For country search, the origin could be the capital city... Jörg On Wed, Jul 2, 2014 at 6:38 PM, Ivan Brusic i...@brusic.com wrote: If you enable explanations, you can see why Lucene the rational behind the scoring: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html You are probably correct in that the array length is influencing the scoring. By default, Lucene will rate higher fields with fewer terms by using length normalization. You can disable norms on the field: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#norms You can fine-tune better by learning how to read Lucene's explanations. It is difficult at first, but it is a useful skill. Cheers, Ivan On Tue, Jul 1, 2014 at 1:02 AM, Pierrick Boutruche pboutru...@octo.com wrote: Up ? Any ideas ? Le lundi 30 juin 2014 17:48:54 UTC+2, Pierrick Boutruche a écrit : Hi everyone, I'm creating on my own a little Geocoder. My goal is to be able to retrieve a big city or a country with a string on input. This string can be mistyped, so I indexed geonames cities5000 data (cities 5000 inhab), and crossed theses data with countries admin data. So I got a 46000 cities index with country, admin pop. I created a search_field in which I put country, admin city name + alternate names provided in cities5000 file. I want, within this array, search for a string. Currently, I'm just searching with a MatchQuery, like Paris in search_field. Unfortunately, the first result is Paris... in Canada... Still, the search_field data is this one, for Paris (CA) and Paris (FR): [u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario'] [u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France', u'Paris', u'Paris'] I don't understand why Paris, CA is first, 'cause there's so much more Paris in the second one... Anyway, is there any way to make the number of my_query terms appearance make the difference ? Because with alternate names, there will be so much much more Paris that it has te count. Actually I think the array length matters in the scoring and I don't want it to... I thought of a custom query score, but I don't think I'm able to get the query term in the script query. Any ideas ? Thanks ! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCa9nNPX-7oQgjXq6AsFVUyxarDOq9SQ3w6M2MMgT2rNQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCa9nNPX-7oQgjXq6AsFVUyxarDOq9SQ3w6M2MMgT2rNQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGUyH0-GBjqAYOrMvuEL_ERA82MMdGEK2GHCBEmOcGOFg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Defing default mapping to enable _timestamp for all indices
Hi, I have the following ES setting defined in my YAML file: http.enabled: false discovery.zen.ping.multicast.enabled: false index: mappings: _default_: _timestamp: enabled: true store : true analysis: analyzer: mica_index_analyzer: type: custom tokenizer: standard filter: [standard,lowercase,mica_nGram_filter] mica_search_analyzer: type: custom tokenizer: standard filter: [standard,lowercase] filter: mica_nGram_filter: type: nGram min_gram: 2 max_gram: 20 My intention is to enable the _timestamp field for all created indices. The above does not seem to work, is the error in the syntax of the YAML or I am missing a step? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bd349c6-a7f4-4627-af0e-088ffe2a0418%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Wrong Scoring using match query on Sense
If you enable explanations, you would see that length normalization is scoring the document with the shorter field higher than the document with a term frequency of 2. The fieldNorm is incredibly lossy since it uses only 1 byte, so there must be some inconsistencies between the example and your test case. The example has a fieldNorm of 0.375, while it is 0.3125 in your case (and mine as well). The example might not have deleted all the documents in the index before the test. Cheers, Ivan On Tue, Jul 1, 2014 at 1:43 AM, rayman idan.f...@gmail.com wrote: I am trying to exercise the following example using Sense : http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/match-query.html . However when I ran GET /my_index/my_type/_search { query: { match: { title: QUICK! } } } I got wrong scoring. I expect to see doc with is 3. But doc with id got higher score. any idea?: { took: 2, timed_out: false, _shards: { total: 1, successful: 1, failed: 0 }, hits: { total: 3, max_score: 0.5, hits: [ { _index: my_index, _type: my_type, _id: 1, _score: 0.5, _source: { title: The quick brown fox } }, { _index: my_index, _type: my_type, _id: 3, _score: 0.44194174, _source: { title: The quick brown fox jumps over the quick dog } }, { _index: my_index, _type: my_type, _id: 2, _score: 0.3125, _source: { title: The quick brown fox jumps over the lazy dog } } ] } } Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/38002b2d-6d70-4a20-9820-7814c37e8aea%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/38002b2d-6d70-4a20-9820-7814c37e8aea%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAdM%3Dx6WfpbGVLRqiuVkF7QLXq0hZgCpqSSnc8A8nqHGQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
[ANN] Elasticsearch Servlet Transport plugin 2.2.0 released
Heya, We are pleased to announce the release of the Elasticsearch Servlet Transport plugin, version 2.2.0. The wares transport plugin allows to use the REST interface over servlets.. https://github.com/elasticsearch/elasticsearch-transport-wares/ Release Notes - elasticsearch-transport-wares - Version 2.2.0 Update: * [21] - Update to elasticsearch 1.2.0 (https://github.com/elasticsearch/elasticsearch-transport-wares/pull/21) New: * [22] - Add plugin release semi-automatic script (https://github.com/elasticsearch/elasticsearch-transport-wares/issues/22) * [17] - NodeServlet should use an elasticsearch node created elsewhere in the webapp (https://github.com/elasticsearch/elasticsearch-transport-wares/issues/17) Issues, Pull requests, Feature requests are warmly welcome on elasticsearch-transport-wares project repository: https://github.com/elasticsearch/elasticsearch-transport-wares/ For questions or comments around this plugin, feel free to use elasticsearch mailing list: https://groups.google.com/forum/#!forum/elasticsearch Enjoy, -The Elasticsearch team -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53b448f6.c5bcb40a.4b4f.78e0SMTPIN_ADDED_MISSING%40gmr-mx.google.com. For more options, visit https://groups.google.com/d/optout.
Re: Problem Configuring AWS S3 for Backups
Unfortunately, I tried with and without the region setting, no difference. On Tuesday, July 1, 2014 7:43:21 PM UTC-4, Glen Smith wrote: I'm not sure it matters, but I noticed you aren't setting a region in either your config or when registering your repo. On Tuesday, July 1, 2014 7:08:28 PM UTC-4, sabdalla80 wrote: I am not sure the version is the problem, I guess I can upgrade from V1.1 to latest. Not able to load credential from supply chain, Any idea this error is generated, Is there any other place that my credentials need to be besides .yml file? Note, I am able to write/read to S3 remotely, so I don't have any priviliges problems that I can think of. On Tuesday, July 1, 2014 4:44:17 PM UTC-4, David Pilato wrote: I think 2.1.1 should work fine as well. That said, you should upgrade to latest 1.1 (or 1.2)... -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 1 juil. 2014 à 22:13, Glen Smith gl...@smithsrock.com a écrit : According to https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/es-1.1 you should use v2.1.0 of the plugin with ES 1.1.0. On Tuesday, July 1, 2014 9:03:04 AM UTC-4, sabdalla80 wrote: I am having a problem setting up backup and restore part of AWS on S3. I have 2.1.1 AWS plugin ElasticSearch V1.1.0 My yml: cloud: aws: access_key: # secret_key: # discovery: type: ec2 When I try to register a repository: PUT /_snapshot/es_repository{ type: s3, settings: { bucket: esbucket }} I get this error, it complains about loading my credentials! Is this ElasticSearch problem or AWS? Note I am running as root user ubuntu on Ec2 and also running AWS with root privileges as opposed to IAM role, not sure if it's a problem or not. error: RepositoryException[[es_repository] failed to create repository]; nested: CreationException[Guice creation errors:\n\n1) Error injecting constructor, com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain\n at org.elasticsearch.repositories.s3.S3Repository.init(Unknown Source)\n while locating org.elasticsearch.repositories.s3.S3Repository\n while locating org.elasticsearch.repositories.Repository\n\n1 error]; nested: AmazonClientException[Unable to load AWS credentials from any provider in the chain]; , status: 500 }ode here... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7db355a-7710-4408-80de-60960fd16d1d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e7db355a-7710-4408-80de-60960fd16d1d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56de92a0-6c58-43b5-b4cc-df7c613ba4e2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
ES doesn't work with rexster gremlin extension
the problem is like the topic. I'm not sure if I misunderstood something or I missed some configurations. The ES works fine in usual situations, but doesn't work with rexster gremlin extension. In java, I configured the graph as follows: https://lh3.googleusercontent.com/-kd6vKDQdH6g/U7RbYPE-ciI/AHk/sSU5e5R3DMM/s1600/1.PNG https://lh3.googleusercontent.com/-9HiEPJmS1FQ/U7Rbjmg4oCI/AHs/2FgdAtjiBHc/s1600/2.PNG In the extension, I wrote: *String query = v. + propKey + : ( + propValue + ); * *if(((TitanGraph) graph).indexQuery(search, query).vertices().iterator().hasNext()){...}* When I invoked the extension, the rexster told: null pointer exception, unknown index, etc. As follows: https://lh4.googleusercontent.com/-I0w-hVgu0F0/U7RdQCcjOZI/AH4/4l13192eR2k/s1600/3.PNG https://lh3.googleusercontent.com/-phHCRqzQN6I/U7RdUkyE8oI/AIA/rhkTlffd8lI/s1600/4.PNG After this, I searched some advices from google and made some configurations in rexster.xml https://lh4.googleusercontent.com/-_6zRCaQtvRw/U7ReJ41k4jI/AIQ/Nwgon5WqPtU/s1600/6.PNG Then, the problems seemed as another way. (As you can see, first_example is the name of the graph.) https://lh5.googleusercontent.com/-r8TvH5CxjqA/U7Rfb9NFNEI/AIs/epemSCk5-8c/s1600/8.PNG Besides, when I invoked the extension, I've been told: the graph is non-configured. https://lh4.googleusercontent.com/-uPWbHmgKL28/U7RfO3PoVlI/AIk/6ndP888fGYA/s1600/7.PNG I've also tried the embedded mode. The problems seem to be the same. Plus, I'm using: Titan 0.4.2 Tinkerpop 0.2.4 Cassandra 2.0.7 ElasticSearch 1.2.1 This problem's driving me crazy. Any pointer would be appreciated! Thanks in advance! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1a915c28-dd2f-4b72-b640-0fd8f9961c8f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Recommended Hardware Specs Sharding\Index Strategy
When you say - do not let a shard grow bigger than your JVM heap (this is really a rough estimation) so segment merging will work flawlessly are we counting all the primary and replicas shards of all indexes on that node? So for example, if we had two indexes with on 10 node cluster. Each index has 10 shards and 1 replica(40 total in cluster). So per node, the heap size should be larger than: 1 shard for first index 1 shard for replica of first index 1 shard for second index 1 shard for replica second index the four shards combined? Thanks again for your advice On Saturday, August 10, 2013 6:50:27 AM UTC-7, Jörg Prante wrote: Your concern is a single shard getting too big. If you use 64bit JVM and mmapfs (quite common), you can open even the largest files. So from this point of view, a node can handle the biggest files. There is no real limit. Another question is throughput performance with large shard files. For example, the more mixed read/write operations are in the workload, the smaller the Lucene indexes should be, to allow the JVM/OS a better load distribution. For selecting a total number of shards and shard size, here are some general rules of thumb: - do not select a smaller number of shards than your total number of nodes you will add to the cluster. Each node should hold at least one shard. - do not let a shard grow bigger than your JVM heap (this is really a rough estimation) so segment merging will work flawlessly - if you want fast recovery, or if you want to move shards around (not a common case), the smaller a shard is the faster the operation will get done In case you are worried about shards getting out of bounds, you can reindex with a higher number of shards (having the _source enabled is always an advantage for reindexing) with your favorite custom tool. Reindexing can take significant time, and may not be an option if you can't stop indexing. Jörg On Fri, Aug 9, 2013 at 4:32 PM, David Arata david...@gmail.com javascript: wrote: My concern is what would would be the best strategy so that an index or single shard in an index does not get too big for a node to handle and if its approaching that size what can be done? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3fc800b0-28d6-4ad0-8aa5-eb182d9b27ee%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Documents not being stored
Hello, I am attempting to set up a large scale ELK setup at work. Here is a basic setup of what we have so far: ``` Nodes (approx 150) [logstash] | | +---+ | | Indexer1 Indexer2 [Redis] [Redis] [Logstash] [Logstash] | | | | ++--+ | | ES Master -- Kibana3 [Master: yes] [Data: no] | | ES Data (4 data nodes) [Master: no] [Data: yes] ``` In case the formatting does not hold with the above, I've created a paste here: https://baneofswitches.privatepaste.com/c8dfc2c30b The Setup = * We have approximately 150 nodes configured to send to a shuffled Redis instance on either Indexer1 or Indexer2. A sanitized version of the node Logstash config is here: https://baneofswitches.privatepaste.com/345b94064d * Each indexer is identical. They both run their own independent Redis service. They then each have a Logstash service that pulls events from Redis and pushes them to the ES Master. They are using the http protocol. A sanitized version of their config is here: https://baneofswitches.privatepaste.com/e19eae690f * The ES Master is configured to only be a Master, and is not set to be a data node. It has 32 GB of RAM. * There are 4 ES data nodes, configured to be data nodes only, they have been configured to be ineligible to be elected as Masters. They have 62 GB RAM and the storage for ES is on SSDs * We have Kibana3 configured to search from the ES Master. * Average # of logs generated by all nodes total seems to be approximately 7k/sec, with peaks up to about 16k/s. * Indexer throughput seems to be good enough that one indexer can work just fine during normal usage. * We are using the default 5 shards with 1 replica The Problem === When this setup is loaded as mentioned above, we are noticing that some logs are being dropped. We were able to test this by running something like: seq 1 5000 | xargs -I{} -n 1 -P 40 logger Testing unqString {} of 5000 Sometimes we would see all 5000 show up in Kibana, other times a subset of them (for example 4800 events). Troubleshooting === We have taken a number of steps to eliminate possibilities. We have confirmed that logs are being reliably transferred from nodes to Redis and from Redis through Logstash. We confirmed this by monitoring counts over many trials. The Redis- logstash leg was tested by outputting to a file and comparing counts. That left the Logstash - ES leg. We tested this by writing a script that pushed fake events via the bulk API. We were unable to reproduce the problem with one request. However, when the cluster is under load (we let 'real' logs flow) and we push via the bulk API with our script we occasionally see partial loss of data. It's important to note that partial loss here means that the request succeeds (200 return code), and much of the data in the bulk request is then searchable, however not all will be. For example, if we put the cluster under load and push a request with a bulk of 5000 events in, we will see 4968 of the 5000 in our subsequent search. We have tried increasing the bulk api threadpool as well as giving a greater percentage (50%) to the indexing buffer. Neither has fixed the issue. Conclusion I am looking for feedback on how to troubleshoot this further and find the cause. I am also looking for information to see if anyone else out there is getting these sorts of incoming volume and what sorts of things they had to do to get their setup working. I appreciate all feedback. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53B46818.7020005%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: [ANN] ElasticUI AngularJS Directives - Easily Build an Interface on top of Elasticsearch
Great idea. I'll give it a try ASAP. On Wednesday, July 2, 2014 10:56:48 PM UTC+12, Yousef El-Dardiry wrote: Hi all, I just open sourced a set of AngularJS Directives for Elasticsearch. It enables developers to rapidly build a frontend (e.g.: faceted search engine) on top of Elasticsearch. http://www.elasticui.com (or github https://github.com/YousefED/ElasticUI) It makes creating an aggregation and listing the buckets as simple as: *ul eui-aggregation=ejs.TermsAggregation('text_agg').field('text').size(10)* *li ng-repeat=bucket in aggResult.buckets{{bucket}}/li* */ul* I think this was currently missing in the ecosystem, which is why I decided to build and open source it. I'd love any kind of feedback. - Yousef *-* Another example; add a checkbox facet based on a field using one of the built-in widgets https://github.com/YousefED/ElasticUI/blob/master/docs/widgets.md: *eui-checklist field='facet_field' size=10/eui-checklist* Resulting in [image: checklist screenshot] -- See why you should attend BroadSoft Connections 2014 http://broadsoftconnections.com/ This email is intended solely for the person or entity to which it is addressed and may contain confidential and/or privileged information. If you are not the intended recipient and have received this email in error, please notify BroadSoft, Inc. immediately by replying to this message, and destroy all copies of this message, along with any attachment, prior to reading, distributing or copying it. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a2ae383e-154a-4a2f-a925-20461157a827%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Min Hard Drive Requirements
Ok, how many were you reducing to? How big is the index? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 3 July 2014 02:03, Ophir Michaeli ophirmicha...@gmail.com wrote: When I tried to optimize the index had 51 shards. Regards, Ophir On Wednesday, July 2, 2014 11:27:50 AM UTC+3, Mark Walkom wrote: It will work until it's full, but then ES will fall over. Merging does require a certain amount of disk space, usually the same amount as the segment that is being merged as it has to take a copy of the shard to work on. So for a 10GB segment, you'd need at least 10GB free. How many shards do you have for the index, or how many are you trying to optimise (merge) down to? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 2 July 2014 18:13, Ophir Michaeli ophirm...@gmail.com wrote: Hi all, I'm testing the indexing of 100 million documents, it took about 400GB of the hard drive. Is there a minimum free hard drive space needed for the index to work OK? I'm asking because after we indexed 100 million documents we tested the index and it worked OK, but then when trying to optimize the optimize took days and then the index did not respond. The hard drive had only 10 GB free space so we tried to copy the index to a new hard drive with a bigger free space, but the index is still not functioning. Thank you, Ophir -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6cdd64fb-dfe7-479f-b44c-3c1e8cff1ce7%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6cdd64fb-dfe7-479f-b44c-3c1e8cff1ce7%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aymYEHAGEg4z5XhXxnpgyCgobzPeoC8Wg%3DMB%3DM3AhCMw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Are there any facets that can be used to co-relate log events ?
Hi Aditya, I'm looking to do something similar, did you have any success with this problem? Thanks Matt On Wednesday, January 22, 2014 11:53:36 PM UTC+13, Aditya Pavan Kumar Vegesna wrote: Hi I am looking for way to co-relate multiple log events and then calculate the time duration between those events? e.g: Request log event response log event - to calculate the difference in timestamps to assess the performance of the application. Can anyone help me how this can be achieved. Thanks Pavan Kumar -- See why you should attend BroadSoft Connections 2014 http://broadsoftconnections.com/ This email is intended solely for the person or entity to which it is addressed and may contain confidential and/or privileged information. If you are not the intended recipient and have received this email in error, please notify BroadSoft, Inc. immediately by replying to this message, and destroy all copies of this message, along with any attachment, prior to reading, distributing or copying it. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d20a2379-c915-477c-877d-690895b22773%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana browser compatibility issues
We are using Logstash-ElasticSearch-Kibana and just want to be able to open the index file in Kibana. What is the necessary plugin that will allow us to do this in something other than firefox? On Monday, June 2, 2014 11:56:35 AM UTC-7, Binh Ly wrote: If you simply point the browser at the file system index.html, in my experience, that only works in Firefox (and only if you explicitly do http://server:9200;). The Kibana default assumes that you actually run Kibana from a web server (or as an ES site plugin if you prefer) and that ES is accessible from the same host as where Kibana is being served from. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/303741f3-a5ce-40c0-b9c0-b2284637c92c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana browser compatibility issues
Laura, The simplest way is to install Kibana as a site plug-in on the same node on which you run Elasticsearch. Not the best way from a performance and security perspective, but certainly the easiest way to start with an absolute minimum of extra levers to pull and knobs to turn, so to speak. So what does that really mean, a site plugin? Assume you configure Elasticsearch to look for plugins within the /opt/elk/plugins directory. Then you unpack the Kibana3 distribution within /opt/kibana3. That means you'll see the following files within /opt/kibana3/kibana-3.1.0: app build.txt config.js css favicon.ico font img index.html LICENSE.md README.md vendor So then create the /opt/elk/plugins/kibana3 directory. Then: $ ln -s /opt/kibana3/kibana-3.1.0 /opt/elk/plugins/kibana3/_site Now when you start ES and point it to the correct configuration file which in turn points it to the plugins directory as described above, Kibana will be available at the following URL (assuming you're on the same host; change localhost as needed, of course): http://localhost:9200/_plugin/kibana3/ Hope this helps! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59b1ac76-d3a5-4b63-bdc6-f617ef8c0627%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Visibility
Hi, I'm trying to get a lot more visibility and metrics into what's going on under the hood. Occasionally, we see spikes in memory. I'd like to get heap mem used on a per shard basis. If I'm not mistaken, somewhere somehow, this Lucene index that is a shard is using memory in the heap, and I'd like to collect metric. It may also be an operation somewhere higher up in the elasticsearch level where we are merging results from shards or results from indexes (maybe elasticsearch doesn't bother to merge twice but merges once), that's also a mem space I'd like to collect data on. I think a per query mem use would also be something interesting, though, perhaps obviously too much to keep up with for every query (maybe a future opt-in feature, unless it's already there and I'm missing it). Other cluster events like nodes entering and exiting the cluster or the changing of the master would be nice to collect. I'm guessing some of this isn't available and some of it is, but my Google-Fu seems to be lacking. I'm pretty sure I can poll to figure out the events happened, but was wondering if there was something in the java client node where I could get a Future or some other hook to turn it into a push instead of a pull. Any help will be appreciated. I'm aware it's a wide net though. --Shannon Monasco -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56362f94-c20b-4201-ae15-5f5f9ca77ff4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
downside to using Bulk API for small/single-doc sets?
Hi, I am using ES Java API to talk to an ES server. Sometimes I need to index a single doc, sometimes dozens or hundreds at a time. I'd prefer to keep my code simple (am a contrarian thinker) and wonder if I can get away with always using bulk API (ie BulkRequestBuilder). so that my interface to ES would look like so: void indexDoc(Doc doc); void indexDocs(CollectionDoc docs); ...but impl would always delegate to BulkRequestBuilder - with number of actions sometimes being ~ 1. Is there a performance (or other) downside to this approach. Specifically, would bulk index updates (with set of size == 1) take significantly longer than non-bulk updates? thanks, -nikita -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9a915ef3-812b-4905-8e4e-852aeb43a81c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Recommended Hardware Specs Sharding\Index Strategy
The heap should be as big as your largest shard, irrespective of what index it belongs to or if it's a replica. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 3 July 2014 05:50, mrno42 doug...@gmail.com wrote: When you say - do not let a shard grow bigger than your JVM heap (this is really a rough estimation) so segment merging will work flawlessly are we counting all the primary and replicas shards of all indexes on that node? So for example, if we had two indexes with on 10 node cluster. Each index has 10 shards and 1 replica(40 total in cluster). So per node, the heap size should be larger than: 1 shard for first index 1 shard for replica of first index 1 shard for second index 1 shard for replica second index the four shards combined? Thanks again for your advice On Saturday, August 10, 2013 6:50:27 AM UTC-7, Jörg Prante wrote: Your concern is a single shard getting too big. If you use 64bit JVM and mmapfs (quite common), you can open even the largest files. So from this point of view, a node can handle the biggest files. There is no real limit. Another question is throughput performance with large shard files. For example, the more mixed read/write operations are in the workload, the smaller the Lucene indexes should be, to allow the JVM/OS a better load distribution. For selecting a total number of shards and shard size, here are some general rules of thumb: - do not select a smaller number of shards than your total number of nodes you will add to the cluster. Each node should hold at least one shard. - do not let a shard grow bigger than your JVM heap (this is really a rough estimation) so segment merging will work flawlessly - if you want fast recovery, or if you want to move shards around (not a common case), the smaller a shard is the faster the operation will get done In case you are worried about shards getting out of bounds, you can reindex with a higher number of shards (having the _source enabled is always an advantage for reindexing) with your favorite custom tool. Reindexing can take significant time, and may not be an option if you can't stop indexing. Jörg On Fri, Aug 9, 2013 at 4:32 PM, David Arata david...@gmail.com wrote: My concern is what would would be the best strategy so that an index or single shard in an index does not get too big for a node to handle and if its approaching that size what can be done? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3fc800b0-28d6-4ad0-8aa5-eb182d9b27ee%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3fc800b0-28d6-4ad0-8aa5-eb182d9b27ee%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z_Fdx7-ew-XaapNN7wN6zfjD97PXSLk3G-QrXOVVoX6A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Looking to build a logging solution with threshold alerting.
There was another thread on this very recently, and some people are using riemann for this. Take a look in the archives and you can probably find some useful info. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 2 July 2014 22:53, Joshua Hall joshuadeanh...@gmail.com wrote: I am looking to build a logging solution and wanted to make sure that I am not missing any key components. The logs that I have are currently stored in a database which there is limited access due to locking risks from bad queries. My plan is to have the dba's write the logs from the database tables to a file on a set interval then have logstash pick up the logs and write it to elastic search. Then for viewing/searching the logs I will be using kibana. Everything up to this point I have been able to make a proof of concept for but the other request was to have alerting. I have spent some time looking at this and the general response seems to be to use percolation, but that seems to only make sense if you want to send an alert if you receive a single error that matches a query and from what I have seen there is no way to a threshold alerting system using percolation. My thought to solve the threshold alerting is to create a simple web UI that allows the user to enter in a query to search for, a threshold, a time frame, and emails to send the alert to that would get stored in elastic search. Then an app (Running as a windows service or cron job) that pulls the alerts and then runs the queries and checks the time-frame and threshold (Would run on some interval). If the count surpasses the threshold then it would send an email to values stored in the email addresses. I know that SPM seems to cover this and move but we are currently looking to see if we can do this without buying another product. Is this the correct approach to take or should I be looking at doing something else? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce1cb3cc-e974-4b3b-8568-a2afaaae6c00%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ce1cb3cc-e974-4b3b-8568-a2afaaae6c00%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z2f%3DD9H1LfWX98oTNNJia2R1-NEwkpiEtZ63FiKrOmGA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Visibility
Depends what you want to do really. There are plugins like ElasticHQ, Marvel, kopf and bigdesk that will give you some info. You can also hook collectd into the stack and take metrics, or use plugins from nagios etc. What monitoring platforms do you have in place now? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 3 July 2014 07:49, smonasco smona...@gmail.com wrote: Hi, I'm trying to get a lot more visibility and metrics into what's going on under the hood. Occasionally, we see spikes in memory. I'd like to get heap mem used on a per shard basis. If I'm not mistaken, somewhere somehow, this Lucene index that is a shard is using memory in the heap, and I'd like to collect metric. It may also be an operation somewhere higher up in the elasticsearch level where we are merging results from shards or results from indexes (maybe elasticsearch doesn't bother to merge twice but merges once), that's also a mem space I'd like to collect data on. I think a per query mem use would also be something interesting, though, perhaps obviously too much to keep up with for every query (maybe a future opt-in feature, unless it's already there and I'm missing it). Other cluster events like nodes entering and exiting the cluster or the changing of the master would be nice to collect. I'm guessing some of this isn't available and some of it is, but my Google-Fu seems to be lacking. I'm pretty sure I can poll to figure out the events happened, but was wondering if there was something in the java client node where I could get a Future or some other hook to turn it into a push instead of a pull. Any help will be appreciated. I'm aware it's a wide net though. --Shannon Monasco -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56362f94-c20b-4201-ae15-5f5f9ca77ff4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/56362f94-c20b-4201-ae15-5f5f9ca77ff4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aCuHQhWVHp0MOrTZH3s0y6kN7jqkg7bXEQF%2BrtwfEqTQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Bulk API: different values for the same parameter at batch and operation level?
Hi there, I noticed that in Java bulk API, some parameters can be set can on both per-batch-request level and per-operation level, e.g. the consistency level parameter: BulkRequestBuilder#setConsistencyLevel v.s IndexRequestBuilder. setConsistencyLevel. What if the parameter has different values between these two level? Will the per-operation one override the per-batch-request one? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1962c6cd-8829-4b58-8b8e-ef3350f6ea98%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Splunk vs. Elastic search performance?
In the latest version of Logstash, you can use the elasticsearch output and just set the protocol to http. The elasticsearch_http output will be removed eventually. On Monday, June 23, 2014 9:22:28 AM UTC-7, Ivan Brusic wrote: I agree. I thought elasticsearch_http was actually the recommended route. Also, I have seen no reported issues with different client/server versions since 1.0. My current logstash setup (which is not production level, simply a dev logging tool) uses Elasticsearch 1.2.1 with Logstash 1.4.1 using the non http interface. -- Ivan On Fri, Jun 20, 2014 at 3:29 PM, Mark Walkom ma...@campaignmonitor.com javascript: wrote: I wasn't aware that the elasticsearch_http output wasn't recommended? When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 21 June 2014 02:43, Brian brian@gmail.com javascript: wrote: Thomas, Thanks for your insights and experiences. As I am someone who has explored and used ES for over a year but is relatively new to the ELK stack, your data points are extremely valuable. Let me offer some of my own views. Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query. During the year that I've used ES and watched this group, I have been on the front line of a brand new product with a smart and dedicated development team working steadily to improve the product. Six months ago, the ELK stack eluded me and reports weren't encouraging (with the sole exception of the Kibana web site's marketing pitch). But ES has come a long way since six months ago, and the ELK stack is much more closely integrated. The Splunk UI is carefully crafted to isolate users from each other and prevent external (to the Splunk db itself, not to our company) users from causing harm to data. But Kibana seems to be meant for a small cadre of trusted users. What if I write a dashboard with the same name as someone else's? Kibana doesn't even begin to discuss user isolation. But I am confident that it will. How can I tell Kibana to set the default Lucene query operator to AND instead of OR. Google is not my friend: I keep getting references to the Ruby versions of Kibana; that's ancient history by now. Kibana is cool and promising, but it has a long way to go for deployment to all of the folks in our company who currently have access to Splunk. Logstash has a nice book that's been very helpful, and logstash itself has been an excellent tool for prototyping. The book has been invaluable in helping me extract dates from log events and handling all of our different multiline events. But it still doesn't explain why the date filter needs a different array of matching strings to get the date that the grok filter has already matched and isolated. And recommendations to avoid the elasticsearch_http output and use elasticsearch (via the Node client) directly contradict the fact that logstash's 1.1.1 version of the ES client library is not compatible with the most recent 1.2.1 version of ES. And logstash is also a resource hog, so we eventually plan to replace it with Perl and Apache Flume (already in use) and pipe it into my Java bulk load tool (which is always kept up-to-date with the versions of ES we deploy!!). Because we send the data via Flume to our data warehouse, any losses in ES will be annoying but won't be catastrophic. And the front-end following of rotated log files will be done using the GNU *tail -F* command and option. This GNU tail command with its uppercase -F option follows rotated log files perfectly. I doubt that logstash can do the same, and we currently see that neither can Splunk (so we sporadically lose log events in Splunk too). So GNU tail -F piped into logstash with the stdin filter works perfectly in my evaluation setup and will likely form the first stage of any log forwarder we end up deploying, Brian On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines.
elasticsearch high cpu usage
Hi, I have 5 clustered nodes and each nodes have 1 replica. total document size is 216 M and 853,000 docs. I was suffering from very high CPU usage. every hours and every early morning about am 05:00 ~ am 09:00 you can see my cacti graph. there is elasticsearch only on this server I thought there are something wrong with es process. but there is a few server request at cpu peak time. and there is no cron job even. $ ./elasticsearch -v *Version: 1.1.1, Build: f1585f0/2014-04-16T14:27:12Z, JVM: 1.7.0_55* $ java -version *java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)* and I installed plugins on elasticsearch: *HQ, bigdesk, head, kopf, sense* es log at cpu peak time: [2014-07-03 08:01:00,045][DEBUG][action.search.type ] [node1] [search][4], node[GJjzCrLvQQ-ZRRoqL13MrQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@451f9e7c] lastShard [true] org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 300) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@68ab486b at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:293) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:300) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:190) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:159) at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142) at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121) at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83) at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291) at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at
elasticsearch high cpu usage
Hi, I have 5 clustered nodes and each nodes have 1 replica. total document size is 216 M and 853,000 docs. I was suffering from very high CPU usage. every hours and every early morning about am 05:00 ~ am 09:00 you can see my cacti graph. there is elasticsearch only on this server I thought there are something wrong with es process. but there is a few server request at cpu peak time. and there is no cron job even. every hours and every early morning about am 05:00 ~ am 09:00 I don't know what's going on elasticsearch at this time!! somebody help me, tell me what happened in there. please.. $ ./elasticsearch -v *Version: 1.1.1, Build: f1585f0/2014-04-16T14:27:12Z, JVM: 1.7.0_55* $ java -version *java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)* and I installed plugins on elasticsearch: *HQ, bigdesk, head, kopf, sense* es log at cpu peak time: [2014-07-03 08:01:00,045][DEBUG][action.search.type ] [node1] [search][4], node[GJjzCrLvQQ-ZRRoqL13MrQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@451f9e7c] lastShard [true] org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 300) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@68ab486b at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:293) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:300) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:190) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:159) at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142) at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121) at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83) at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291) at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at
Re: Visibility
I currently record basically everything in bigdesk, all the numerics from cluster health, cluster state, nodes info, node stats, index status and segments. I want memory allocated on a per shard level for Lucene level actions, query level actions (outside field and filter cache) and hooks into events like nodes entering and exiting the cluster, new indexes, alias and other administrative changes and master elections. Basically when it comes to memory I'd like to have all parts of the heap accounted for. Field + filter cache is not accounting for whatever process is spiking nor does it answer most of the heap. At 29 gigs being used and garbage collection taking minutes, but not getting anything, elastic is only reporting 7 gigs in cache. We can discuss my particular memory problems and solutions, but mostly I'm after the visibility. --Shannon Monasco On Jul 2, 2014 5:50 PM, Mark Walkom ma...@campaignmonitor.com wrote: Depends what you want to do really. There are plugins like ElasticHQ, Marvel, kopf and bigdesk that will give you some info. You can also hook collectd into the stack and take metrics, or use plugins from nagios etc. What monitoring platforms do you have in place now? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 3 July 2014 07:49, smonasco smona...@gmail.com wrote: Hi, I'm trying to get a lot more visibility and metrics into what's going on under the hood. Occasionally, we see spikes in memory. I'd like to get heap mem used on a per shard basis. If I'm not mistaken, somewhere somehow, this Lucene index that is a shard is using memory in the heap, and I'd like to collect metric. It may also be an operation somewhere higher up in the elasticsearch level where we are merging results from shards or results from indexes (maybe elasticsearch doesn't bother to merge twice but merges once), that's also a mem space I'd like to collect data on. I think a per query mem use would also be something interesting, though, perhaps obviously too much to keep up with for every query (maybe a future opt-in feature, unless it's already there and I'm missing it). Other cluster events like nodes entering and exiting the cluster or the changing of the master would be nice to collect. I'm guessing some of this isn't available and some of it is, but my Google-Fu seems to be lacking. I'm pretty sure I can poll to figure out the events happened, but was wondering if there was something in the java client node where I could get a Future or some other hook to turn it into a push instead of a pull. Any help will be appreciated. I'm aware it's a wide net though. --Shannon Monasco -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56362f94-c20b-4201-ae15-5f5f9ca77ff4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/56362f94-c20b-4201-ae15-5f5f9ca77ff4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/sF_C58d96ms/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aCuHQhWVHp0MOrTZH3s0y6kN7jqkg7bXEQF%2BrtwfEqTQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624aCuHQhWVHp0MOrTZH3s0y6kN7jqkg7bXEQF%2BrtwfEqTQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFDU5WJYc8LbWyqY%2Bu8u%2BcS_WbgNBOfzNEKOLT_xV4pwnfQj_Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch high cpu usage
Hi, I have 5 clustered nodes and each nodes have 1 replica. total document size is 216 M and 853,000 docs. I was suffering from very high CPU usage. every hours and every early morning about am 05:00 ~ am 09:00 you can see my cacti graph. there is elasticsearch only on this server I thought there are something wrong with es process. but there is a few server request at cpu peak time. and there is no cron job even. every hours and every early morning about am 05:00 ~ am 09:00 I don't know what's going on elasticsearch at this time!! somebody help me, tell me what happened in there. please.. $ ./elasticsearch -v Version: 1.1.1, Build: f1585f0/2014-04-16T14:27:12Z, JVM: 1.7.0_55 $ java -version java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) and I installed plugins on elasticsearch: HQ, bigdesk, head, kopf, sense es log at cpu peak time: [2014-07-03 08:01:00,045][DEBUG][action.search.type ] [node1] [search][4], node[GJjzCrLvQQ-ZRRoqL13MrQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@451f9e7c] lastShard [true] org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 300) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@68ab486b at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:293) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:300) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:190) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:159) at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142) at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121) at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83) at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291) at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at
Re: Are there any facets that can be used to co-relate log events ?
Hey mathew Sorry no luck with that Cheers Aditya On Jul 3, 2014 2:22 AM, Matthew Morrison mmorri...@broadsoft.com wrote: Hi Aditya, I'm looking to do something similar, did you have any success with this problem? Thanks Matt On Wednesday, January 22, 2014 11:53:36 PM UTC+13, Aditya Pavan Kumar Vegesna wrote: Hi I am looking for way to co-relate multiple log events and then calculate the time duration between those events? e.g: Request log event response log event - to calculate the difference in timestamps to assess the performance of the application. Can anyone help me how this can be achieved. Thanks Pavan Kumar See why you should attend BroadSoft Connections 2014 http://broadsoftconnections.com/ This email is intended solely for the person or entity to which it is addressed and may contain confidential and/or privileged information. If you are not the intended recipient and have received this email in error, please notify BroadSoft, Inc. immediately by replying to this message, and destroy all copies of this message, along with any attachment, prior to reading, distributing or copying it. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rR1nonZMWd4/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d20a2379-c915-477c-877d-690895b22773%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d20a2379-c915-477c-877d-690895b22773%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPtG3Y9FqnEdXsOQG3pWQB2k-Lf5UGK5d9kEgHFwHh%3D2aqc5EQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.