Re: duplicate documents in query,
1.5.2 On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote: Hi , I have some strange issue . I get duplicate documents when querying: GET track_2011*/_search { query: { bool: { must: [ { range: { ts: { gte: 2011-08-30T00:00:00Z, lte: 2011-08-31T23:59:00Z } } }, { term: { entity_id: { value: 298082 } } } ] } } , sort: [ { ts: { order: asc } } ], size: 90 } Result (there are more, just showing duplicates): { _index: track_201108, _type: position, _id: 298082_1314758608000_1302, _score: null, _source: { ts: 1314758608000, entity_id: 298082, loc: { type: point, coordinates: [ 103.69478, 1.2346333 ] } }, sort: [ 1314758608000 ] }, { _index: track_201108, _type: position, _id: 298082_1314758608000_1302, _score: null, _source: { ts: 1314758608000, entity_id: 298082, loc: { type: point, coordinates: [ 103.69478, 1.2346333 ] } }, sort: [ 1314758608000 ] } But if i get the document : curl -s es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp { found : true, _version : 1, _type : position, _index : track_201108, _source : { hourly : false, loc : { type : point, coordinates : [ 103.69478, 1.2346333 ] }, ts : 1314758608000, entity_id : 298082 }, _id : 298082_1314758608000_1302 } So i have only one document (and it was never updated as version is 1 ). I don't understand what is going on here . No special routing, no parent/child relations. Any ideas ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
duplicate results in query
Hi , I have some strange issue . I get duplicate documents when querying: GET track_2011*/_search { query: { bool: { must: [ { range: { ts: { gte: 2011-08-30T00:00:00Z, lte: 2011-08-31T23:59:00Z } } }, { term: { entity_id: { value: 298082 } } } ] } } , sort: [ { ts: { order: asc } } ], size: 90 } Result (there are more, just showing duplicates): { _index: track_201108, _type: position, _id: 298082_1314758608000_1302, _score: null, _source: { ts: 1314758608000, entity_id: 298082, loc: { type: point, coordinates: [ 103.69478, 1.2346333 ] } }, sort: [ 1314758608000 ] }, { _index: track_201108, _type: position, _id: 298082_1314758608000_1302, _score: null, _source: { ts: 1314758608000, entity_id: 298082, loc: { type: point, coordinates: [ 103.69478, 1.2346333 ] } }, sort: [ 1314758608000 ] } But if i get the document : curl -s es01.vesseltracker.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp { found : true, _version : 1, _type : position, _index : track_201108, _source : { hourly : false, loc : { type : point, coordinates : [ 103.69478, 1.2346333 ] }, ts : 1314758608000, entity_id : 298082 }, _id : 298082_1314758608000_1302 } So i have only one document (and it was never updated as version is 1 ). I don't understand what is going on here . No special routing, no parent/child relations. Any ideas ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e9d61e9-ef34-47e8-925c-15addb510850%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
duplicate documents in query,
Hi , I have some strange issue . I get duplicate documents when querying: GET track_2011*/_search { query: { bool: { must: [ { range: { ts: { gte: 2011-08-30T00:00:00Z, lte: 2011-08-31T23:59:00Z } } }, { term: { entity_id: { value: 298082 } } } ] } } , sort: [ { ts: { order: asc } } ], size: 90 } Result (there are more, just showing duplicates): { _index: track_201108, _type: position, _id: 298082_1314758608000_1302, _score: null, _source: { ts: 1314758608000, entity_id: 298082, loc: { type: point, coordinates: [ 103.69478, 1.2346333 ] } }, sort: [ 1314758608000 ] }, { _index: track_201108, _type: position, _id: 298082_1314758608000_1302, _score: null, _source: { ts: 1314758608000, entity_id: 298082, loc: { type: point, coordinates: [ 103.69478, 1.2346333 ] } }, sort: [ 1314758608000 ] } But if i get the document : curl -s es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp { found : true, _version : 1, _type : position, _index : track_201108, _source : { hourly : false, loc : { type : point, coordinates : [ 103.69478, 1.2346333 ] }, ts : 1314758608000, entity_id : 298082 }, _id : 298082_1314758608000_1302 } So i have only one document (and it was never updated as version is 1 ). I don't understand what is going on here . No special routing, no parent/child relations. Any ideas ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abf4a5a9-495f-4480-b326-0d9562c696b1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Big free space dis-balance
Hi, I have 9 node cluster. I notice that the free space is greatly dis-balanced. On node1 i have only 90GB left, while on other nodes I still have around 180GB free. I am pretty sure that no new shards will be allocated as the node is above the watermark. I think this started when I upgraded to 1.5 or may be last version from 1.4 series Is there anything i can do about it ? Thanks in advance georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/046f7877-c402-447f-a954-68ef2a17c893%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
shard allocation per node
Hi, What is the rule for primary shard allocation for single index? I created one index with 9 primary and 0 replica shards. Elasticsearch allocated 5 primary shards on ES01 server (the node with least storage available) and 4 more shards on different nodes. I have 9 servers. For this index only 5 servers are in use. I don't think this is a correct behavior . The free space is pretty much even on every host, except ES01 (i still don't understand why), and i did't hit any hi/low watermarks. Any idea what is happening here ? Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d5f1140-cb10-4ac4-8bd7-2a41433e61f4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Number of shards in 4 node Cluster
My rules is : 1 primary shard per server. Also make some estimation how big will be the single index/shard I think it is not good if single shard exceed 10 GB, although there is no exact limit. Georgi On Tuesday, March 17, 2015 at 7:00:23 PM UTC+1, John S wrote: Hi All, Is there any best practices of having on the number of shards for a cluster? I have a 4 node cluster and used shards of 20. During any node failure or other events i doubts since the shards number is high, replication to new node is taking more time... Is there any metrics or formula to be done for number or shards? Regards John -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36ef3ed0-870f-41a5-915b-fb3ad919f7a0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Index Polygons?
Like Jun said, You need geo shape type. The problem is, it is very slow to index shapes (except POINT). I tried with line-strings and it is extremely slow with linestring 10 points long or longer. It is just killing the CPU. On Tuesday, February 24, 2015 at 6:38:37 AM UTC+1, Sai Asuka wrote: So I see the elasticsearch claims to use GeoJSON as the format for indexing... but when I look at the docs.. the same it gives is: { location : { type : polygon, coordinates : [ [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ] ] } } Doesn't GeoJSON look like this? { type: Feature, properties: { name: Sparkle, age: 11 }, geometry: { type : polygon, coordinates : [[[100.0, 0.0], [101.0,0.0], [101.0, 1.0], [100.0], [100.0, 0.0]]] } } My question is how do I index polygons in elasticsearch if I want to attach properties to it? If I wanted to perform a bulk load for example, what does one document look like that has polygon information that I can perform geospatial queries on? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d06e2c25-1f30-4c30-94dd-435ab9aa63f5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Index Polygons?
You can whatever you want to the index just define your mapping like this: { my_type : { _all : { enabled : false }, properties : { field1 : { type : double, index : no }, my_polygon : { tree : quadtree,//or geohash here type : geo_shape }, field2 : { type : double, index : no } } } } On Tuesday, February 24, 2015 at 6:38:37 AM UTC+1, Sai Asuka wrote: So I see the elasticsearch claims to use GeoJSON as the format for indexing... but when I look at the docs.. the same it gives is: { location : { type : polygon, coordinates : [ [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ] ] } } Doesn't GeoJSON look like this? { type: Feature, properties: { name: Sparkle, age: 11 }, geometry: { type : polygon, coordinates : [[[100.0, 0.0], [101.0,0.0], [101.0, 1.0], [100.0], [100.0, 0.0]]] } } My question is how do I index polygons in elasticsearch if I want to attach properties to it? If I wanted to perform a bulk load for example, what does one document look like that has polygon information that I can perform geospatial queries on? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dfd91b48-d02f-430e-87e4-2b86c3ebb626%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: doc_values for non analyzed fields
So, There is no sense to reindex everything just to set doc_value to non analyzed fields. Non indexed fields are not in the field data anyway Right ? 2014-11-29 11:50 GMT+01:00 Adrien Grand adrien.gr...@elasticsearch.com: Doc values cannot be used to fetch values, they are only used for sorting, scripts and aggregations. It is like fielddata, but computed at indexing time and stored on disk. On Fri, Nov 28, 2014 at 4:07 PM, Georgi Ivanov georgi.r.iva...@gmail.com wrote: Hi, Will it make any difference in terms of field data memory, if I set the field data format to doc_values for all fields that have mapping index : no ? Are these (non-analyzed) fields ever loaded in memory on first place ? Example field mapping : rot: { index: no, type: integer } I don't need to query on these fields, but i need to fetch them . Any negative impact on my queries ? Thank you Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f4379be6-f750-4703-a163-211e7a6e0501%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f4379be6-f750-4703-a163-211e7a6e0501%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/EQnqv0tm75c/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5eK7qP9WVcLOSj0BhaRKk-%2BCftNwwH4HdFEbm%3DdJoKUQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5eK7qP9WVcLOSj0BhaRKk-%2BCftNwwH4HdFEbm%3DdJoKUQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGKxwg%3D6mycKc60NocPE7LrvJjJDDu3r2w31qX_dLo0QXQziFg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
doc_values for non analyzed fields
Hi, Will it make any difference in terms of field data memory, if I set the field data format to doc_values for all fields that have mapping index : no ? Are these (non-analyzed) fields ever loaded in memory on first place ? Example field mapping : rot: { index: no, type: integer } I don't need to query on these fields, but i need to fetch them . Any negative impact on my queries ? Thank you Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f4379be6-f750-4703-a163-211e7a6e0501%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES java api: how to handle connectivity problems?
That's strange. Can it be a problem in the code ? Something like looping forever ? You can set the timeout to bulk request , but there is default timeout of 1 minute. May be some code will help. On Friday, November 28, 2014 3:09:37 PM UTC+1, msbr...@gmail.com wrote: While testing how to handle es-cluster connectivity issues I ran into a serious problem. The java api node client is connected and then the ES server is killed. The application hangs in some bulkRequest, but this call never returns. It also does not return, even if the cluster was started. On console this exception is shown: Exception in thread elasticsearch[event-collector/12240@amnesia][generic][T#2] org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master]; at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138) at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128) at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197) at org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65) at org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143) at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I am wondering that this scenario does not work. Any other scenario e.g. shutdown 1-of-2 nodes is transparently handled. But now the client application seems hanging for ever. And ideas? regards, markus -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7409f05d-b2cf-458d-b081-081bb10384d9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Shards UNASSIGNED even tho they exist on disk
Sounds like you don't have enough space on disk(s) Happened to me when upgrading to 1.4. On Monday, November 10, 2014 2:45:09 PM UTC+1, Johan Öhr wrote: Hi, I have a problem with a few index, some of the shards (both replica and primary) are UNASSIGNED, my cluster stays yellow. This is what the master says about that: [2014-11-10 06:53:01,223][WARN ][cluster.action.shard ] [node-master] [index][9] received shard failed for [index][9], node[9g2_kOrDSt-57UVI1bLfFg], [P], s[STARTED], indexUUID [20P5SMNFTZyrUEVyUPCsbQ], reason [master [node-master][07ZcjsurR3iIVsH6iSX0jw][data-node][inet[/xx.xx.xx.xx:9300]]{data=false, master=true} marked shard as started, but shard has not been created, mark shard as failed] http://host:9200/index/_stats_shards:{failed:0,successful:13, total:20 This happend when i dropped a node, and let it replicate itself together, replication factor is 1 (two shards identical) I did it on two nodes, worked perfectly, then on the third node, i have 92 SHARDS Unassigned The only different between the first two nodes and the third is that it ran with these settings: cluster.routing.allocation.disk.threshold_enabled: true, cluster.routing.allocation.disk.watermark.low: 0.85, cluster.routing.allocation.disk.watermark.high: 0.90, cluster.info.update.interval: 60s, indices.recovery.concurrent_streams: 10, cluster.routing.allocation.node_concurrent_recoveries: 40, Any idea if this can be fixed? Ive tried to clean up the masters and restarted them, nothing Ive tried to delete _state on data-node on these index, nothing Thanks for help :) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c1c3bd8-1e94-471d-ab06-fc643afe836e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to search non indexed field in elasticsearch
wildcard query is also working on non-indexed fields. On Friday, November 7, 2014 8:11:30 AM UTC+1, ramky wrote: Thanks Nikolas Everett for your quick reply. Can you please provide me example to execute the same. I tried multiple times but unable to execute. Thanks in advance On Thursday, November 6, 2014 9:44:55 PM UTC+5:30, Nikolas Everett wrote: You can totally use a script filter checking the field against _source. Its super duper duper slow but you can do it if you need it rarely. On Thu, Nov 6, 2014 at 11:13 AM, Ivan Brusic iv...@brusic.com wrote: You cannot search/filter on a non-indexed field. -- Ivan On Wed, Nov 5, 2014 at 11:45 PM, ramakrishna panguluri panguluri@gmail.com wrote: I have 10 fields inserted into elasticsearch out of which 5 fields are indexed. Is it possible to search on non indexed field? Thanks in advance. Regards Rama Krishna P -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c63ac6bb-8717-470e-a5e4-01a8bd75b769%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c63ac6bb-8717-470e-a5e4-01a8bd75b769%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDD0JYJeX%2BCmV%3DGACekwofjUYFQvoSWQ86Th3r-MBWZtw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDD0JYJeX%2BCmV%3DGACekwofjUYFQvoSWQ86Th3r-MBWZtw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7c772529-df50-4d34-82f9-5f444ef6c5b2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: I have a few million users, and I want to index for per user, but.....
Aliases may be ? So one index , may aliases (user_1,user_2...) On Friday, November 7, 2014 9:09:03 AM UTC+1, David shi wrote: I have a few million users, and will continue to grow, maybe a year later increased to 1000W. each user have a lot of files , the file size is not fixed, maybe more from 1M ~ 10M. I need to do is to give each user's document indexing, and allow the current user can quickly search through the right content to the document. I have to think of is to build an index for each user, but there are restrictions on the number of files in Linux single directory. You have any good suggestions for me??? Thank you very much! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4389e5ed-c51d-4788-b91c-1a0fb241ee5e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to search non indexed field in elasticsearch
Taken from here http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query Matches documents that have fields matching a wildcard expression (*not analyzed*) I also use wildcard on non analyzed fields, and it is working. 2014-11-07 9:44 GMT+01:00 David Pilato da...@pilato.fr: wildcard query is also working on non-indexed fields. Are you sure? I don’t think so. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com http://Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/iGBwluGlWpc/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5E0F733A-C7D5-452F-951A-E104A92ACF1C%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/5E0F733A-C7D5-452F-951A-E104A92ACF1C%40pilato.fr?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGKxwgkYrJ8mstHk%2Bt8pdTUanZoVpTeLF9dh8-viV%2BYymx%3DdAA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Upgrade to Es 1.4.0. Backup shards not initializing
Hi, Just upgraded to 1.4 I don't see any errors, but only the primary shards are initialized. Any idea what is happening ? Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1fec2a0-7d6e-486d-b9aa-7c6774b5fb86%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Upgrade to Es 1.4.0. Backup shards not initializing
Found it . I had less than 15% free space on disks , and allocation was disabled. The annoying part is that i had to enable DEBUG in logging.yml just to see this ! Will file a bug report. This must be at least WARNING Hope this helps someone else. Georgi On Thursday, November 6, 2014 2:07:04 PM UTC+1, Georgi Ivanov wrote: Hi, Just upgraded to 1.4 I don't see any errors, but only the primary shards are initialized. Any idea what is happening ? Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b9fca68-8866-4f3b-b3cf-c00665b866f9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Performance problems with large data volumes
Ok .. so it is Java 1. You are not doing this right . 2. You should use BulkRequest or better BulkProcessor class 3. Do NOT do setRefresh ! This way you are forcing ES to do the real indexing which will load the cluster a LOT 4. Set the refresh interval of your index to something line 30s or 60s Here is a snippet of code using BulkProcessor (it will not run , because i removed some parts but it will give u an idea) public class IndexFoo { private Connection connection = null; public Client client; Integer bulkSize = 1000; private CommandLine cmd; //BulkRequestBuilder bulkRequest; BulkProcessor bulkRequest; private String index; SetString hosts = new HashSetString(); private int threads = 5; public IndexFoo(CommandLine cmd) throws SQLException, ParseException { this.cmd = cmd; this.index = cmd.getOptionValue(index); if (cmd.hasOption(b)) { this.bulkSize = Integer.valueOf(cmd.getOptionValue(b)); } if (cmd.hasOption(t)) { this.threads = Integer.valueOf(cmd.getOptionValue(t)); } if (cmd.hasOption(h)) { String[] hosts = cmd.getOptionValue(h).split(,); for (String host : hosts) { this.hosts.add(host); } } this.connectES(); this.bulkRequest = this.getBulkProcessor(); } private void processData(ResultSet rs) throws SQLException { while (rs.next()) { //index bulkRequest.add(client.prepareIndex(myIndex, mytype, id.toString()).setSource(mySource).request()); }//while this.bulkRequest.close(); System.out.println(Indexing done); } private BulkProcessor getBulkProcessor(){ return BulkProcessor.builder(client, new BulkProcessor.Listener() { @Override public void beforeBulk(long executionId, BulkRequest request) { //System.out.println(Executing bulk #+executionId+ +request.numberOfActions()); } @Override public void afterBulk(long executionId, BulkRequest request, Throwable failure) { } @Override public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { System.out.println(Bulk #+executionId+/+request.numberOfActions()+ executed in +response.getTook().secondsFrac()+ sec.); if (response.hasFailures()) { for (BulkItemResponse bulkItemResponse : response.getItems()) { if (bulkItemResponse.isFailed()){ System.err.println(Failure message : + bulkItemResponse.getFailureMessage()); } } System.exit(-1); } } }).setConcurrentRequests(this.threads ).setBulkActions(this.bulkSize).build(); } } 2014-11-04 17:53 GMT+01:00 John D. Ament john.d.am...@gmail.com: And actually now that I'm looking at it again - I wanted to ask why I need to use setRefresh(true)? In my case, we were not seeing index data updated quick enough upon indexing a record. setting refresh = true was doing it for us. If there's a way to avoid it, that might help me here? On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote: Georgi, I'm indexing the data through regular index request via java final IndexResponse response = esClient.client().prepareIndex(indexName, type) .setSource(json).setRefresh( true).execute().actionGet(); json in this case is a byte[] with the json data in it. The requests come in via multiple HTTP requests, but I'm not leveraging any specific multithreading within the ES client. I hope this helps, I'm not 100% sure what information would help identify. John On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote: So you run OOM when you index data ? If so : How do you index the data ? Are you using BulkRequest ? Which programming language are you using ? Are you using multiple threads to index ? If you are using Bulk request , you should limit the size of the bulk. You can also tune the bulk request pool in ES. In general, you are very brief in describing you problem :) Georgi 2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com: Georgi, Thanks for the quick reply! I have 4k indices. We're creating an index per tenant. In this environment we've created 4k tenants. We're running out of memory just letting the loading of records run. John On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote: Hi, I don't think 24k documents are large data. What is strange for me is 4000 indices. This is strange .. how many indices do you need ? On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB When you are running OOM ? Example query(ies) ? How my nodes ? Some more info please :) Also, 6GB Heap is not too much, but that depends on your use case Georgi On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote: Hi, So I have what you might want to consider a large set of data. We have about 25k records in our index, and the disk space is taking up around 2.5 gb, spread across a little more than 4000 indices. Currently our master node is set for 6gb of ram. We're seeing that after loading this data the JVM will eventually crash, sometimes in as little as 5 minutes. Is this not enough horse power for this data set
Re: Performance problems with large data volumes
Here is how to set refresh interval: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html When you force refresh after every document, you are putting unnecessary load to ES. Indexing single document in a single call is completely fine, but is also very slow and inefficient :) This way you are also utilizing the available indexing threads in ES. You can read the documentation about this here : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html If you use bulk request , you can index (tens)thousands of docs per second, depending on your hardware. With BulkProcessor class you can set how many threads will run, how may document will be sent in one bulk.. etc. It is much more efficient then indexing single document. 2014-11-05 12:53 GMT+01:00 John D. Ament john.d.am...@gmail.com: Hi, I doubt the issue is that I'm not using bulk requests. My requests come in one at a time, not in bulk. If you can explain why bulk is required that would help. I can believe that the refresh is causing the issue. I would prefer to test that one by itself. How do I configure the refresh interval on the index? John On Wednesday, November 5, 2014 3:43:37 AM UTC-5, Georgi Ivanov wrote: Ok .. so it is Java 1. You are not doing this right . 2. You should use BulkRequest or better BulkProcessor class 3. Do NOT do setRefresh ! This way you are forcing ES to do the real indexing which will load the cluster a LOT 4. Set the refresh interval of your index to something line 30s or 60s Here is a snippet of code using BulkProcessor (it will not run , because i removed some parts but it will give u an idea) public class IndexFoo { private Connection connection = null; public Client client; Integer bulkSize = 1000; private CommandLine cmd; //BulkRequestBuilder bulkRequest; BulkProcessor bulkRequest; private String index; SetString hosts = new HashSetString(); private int threads = 5; public IndexFoo(CommandLine cmd) throws SQLException, ParseException { this.cmd = cmd; this.index = cmd.getOptionValue(index); if (cmd.hasOption(b)) { this.bulkSize = Integer.valueOf(cmd.getOptionValue(b)); } if (cmd.hasOption(t)) { this.threads = Integer.valueOf(cmd.getOptionValue(t)); } if (cmd.hasOption(h)) { String[] hosts = cmd.getOptionValue(h).split(,); for (String host : hosts) { this.hosts.add(host); } } this.connectES(); this.bulkRequest = this.getBulkProcessor(); } private void processData(ResultSet rs) throws SQLException { while (rs.next()) { //index bulkRequest.add(client.prepareIndex(myIndex, mytype, id.toString()).setSource(mySource).request()); }//while this.bulkRequest.close(); System.out.println(Indexing done); } private BulkProcessor getBulkProcessor(){ return BulkProcessor.builder(client, new BulkProcessor.Listener() { @Override public void beforeBulk(long executionId, BulkRequest request) { //System.out.println(Executing bulk #+executionId+ +request.numberOfActions()); } @Override public void afterBulk(long executionId, BulkRequest request, Throwable failure) { } @Override public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { System.out.println(Bulk #+executionId+/+request.numberOfActions()+ executed in +response.getTook().secondsFrac()+ sec.); if (response.hasFailures()) { for (BulkItemResponse bulkItemResponse : response.getItems()) { if (bulkItemResponse.isFailed()){ System.err.println(Failure message : + bulkItemResponse. getFailureMessage()); } } System.exit(-1); } } }).setConcurrentRequests(this.threads ).setBulkActions(this. bulkSize).build(); } } 2014-11-04 17:53 GMT+01:00 John D. Ament john.d...@gmail.com: And actually now that I'm looking at it again - I wanted to ask why I need to use setRefresh(true)? In my case, we were not seeing index data updated quick enough upon indexing a record. setting refresh = true was doing it for us. If there's a way to avoid it, that might help me here? On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote: Georgi, I'm indexing the data through regular index request via java final IndexResponse response = esClient.client().prepareIndex(indexName, type) .setSource(json).setRefresh(tr ue).execute().actionGet(); json in this case is a byte[] with the json data in it. The requests come in via multiple HTTP requests, but I'm not leveraging any specific multithreading within the ES client. I hope this helps, I'm not 100% sure what information would help identify. John On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote: So you run OOM when you index data ? If so : How do you index the data ? Are you using BulkRequest ? Which programming language are you using ? Are you using multiple threads to index ? If you are using Bulk request , you should limit the size
Re: Performance problems with large data volumes
Hi, I don't think 24k documents are large data. What is strange for me is 4000 indices. This is strange .. how many indices do you need ? On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB When you are running OOM ? Example query(ies) ? How my nodes ? Some more info please :) Also, 6GB Heap is not too much, but that depends on your use case Georgi On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote: Hi, So I have what you might want to consider a large set of data. We have about 25k records in our index, and the disk space is taking up around 2.5 gb, spread across a little more than 4000 indices. Currently our master node is set for 6gb of ram. We're seeing that after loading this data the JVM will eventually crash, sometimes in as little as 5 minutes. Is this not enough horse power for this data set? What could be tuned to resolve this? John -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ee8b784c-2fd5-403d-853e-5a1e893831dd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Performance problems with large data volumes
So you run OOM when you index data ? If so : How do you index the data ? Are you using BulkRequest ? Which programming language are you using ? Are you using multiple threads to index ? If you are using Bulk request , you should limit the size of the bulk. You can also tune the bulk request pool in ES. In general, you are very brief in describing you problem :) Georgi 2014-11-04 17:05 GMT+01:00 John D. Ament john.d.am...@gmail.com: Georgi, Thanks for the quick reply! I have 4k indices. We're creating an index per tenant. In this environment we've created 4k tenants. We're running out of memory just letting the loading of records run. John On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote: Hi, I don't think 24k documents are large data. What is strange for me is 4000 indices. This is strange .. how many indices do you need ? On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB When you are running OOM ? Example query(ies) ? How my nodes ? Some more info please :) Also, 6GB Heap is not too much, but that depends on your use case Georgi On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote: Hi, So I have what you might want to consider a large set of data. We have about 25k records in our index, and the disk space is taking up around 2.5 gb, spread across a little more than 4000 indices. Currently our master node is set for 6gb of ram. We're seeing that after loading this data the JVM will eventually crash, sometimes in as little as 5 minutes. Is this not enough horse power for this data set? What could be tuned to resolve this? John -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGKxwgkMCyBkXwg3MNhQp0hqGT6Czz3R2RPBC73B56Bo-yg-dA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
ES and Java 8. Does it worth the effort ?
Hi , I wander if i should start using Java 8 with my ES cluster. Are there any benefits using Java 8 ? For example : faster GC , faster Java itself .. anything ES would bebefit from Java 8 .. etc Please share your experience. Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9cf82905-63cf-43f2-b14a-de8f21cb4b50%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch support for Java 1.8?
As far as I know , ES will work just fine with java 1.8, except script support. I read some articles on the Internet that scripting support is broken with java 1.8 But I would love to hear someone who actually tried :) On Tuesday, June 17, 2014 3:19:37 PM UTC+2, Chris Neal wrote: Hi, I saw this blog post from April stating java 1.7u55 as being safe for Elasticsearch, but I didn't see anything about Java 1.8 support. Just wondering if it was :) http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/ Thanks! Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e7fa099-f52d-4e70-a533-e013eb0cd75c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Problem setting up cluster with NAT address
Doesn't sound like elasticsearch issue ... I would look to my FW rules On Tuesday, June 17, 2014 2:17:20 PM UTC+2, pmartins wrote: Hi, I'm having some problems setting up a 1.2.1 ES cluster. I have two nodes, each one in a different data center/network. One of the nodes is behind a NAT address, so I set network.publish_host to de NAT address. Both nodes connect to each other without problems. The issue is when the node behind the NAT address tries to connect to himself. In my network, he doesn't know his NAT address and can't solve it. So I get the exception: [2014-06-17 12:58:19,681][WARN ][cluster.service ] [vm-motisqaapp02] failed to reconnect to node [vm-motisqaapp02][4oSfsIaBTSyQWdnxiTt7Cw][vm-motisqaapp02.***][inet[/10.10.1.135:9300]]{master=true} org.elasticsearch.transport.ConnectTransportException: [vm-motisqaapp02][inet[/10.10.1.135:9300]] connect_timeout[30s] at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:727) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:656) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:624) at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146) at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:518) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection timed out: /10.10.1.135:9300 at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137) at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ... 3 more vm-motisqaapp02 NAT address is 10.10.1.135, but locally it can't solve this address. Is there any way that I can setup other IP to comunicate locally? -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Problem-setting-up-cluster-with-NAT-address-tp4057849.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24e11c67-8133-4893-b665-09f31735f269%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Share a document across multiple indices
Will aliases help you in this case ? For example : index1 : [doc1] index2 : [doc2] Create an alias Docs for index1 and index2 The run queries against the alias? On Monday, June 16, 2014 3:51:45 AM UTC+2, Martin Angers wrote: Hi, I'm wondering if this is a supported scenario in ElasticSearch, reading the guide and API reference I couldn't find a way to achieve this. I'd like to index documents only once, say in a master index, and then create secondary or meta indices that would only contain a subset of the master index. For example, document A, B and C would be indexed once in the master index. Then a secondary index would be able to see only documents A and B, while another secondary index could see only documents B and C, etc. (and by see I mean the search queries should only consider those documents) The idea being that documents could be relatively big, and they should not be indexed multiple times. Does that make sense? Am I missing the right way to design such a pattern? I am new to ES. Thanks, Martin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ab9180d-6af9-45d4-8b2c-22f32869ee2a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes
I don't know how you are doing the indexing . Are you using bulk request or .. ? Bulk insert can greatly increase indexing speed. You can also check node client. It should have better indexing speed because it will be 1 hop operation, compared to two hop with transport client. (assuming Java AP here) You can hit the limits of the bulk thread pool(can be increased). If you are sending all indexing ops to one server only. One could try to hist all master nodes on round-robin basis. You can monitor IOPs in marvel (or iostat locally on the server) to see if are not hitting IO limit. On my ES cluster i reach 50k indexing ops per second. On Monday, June 9, 2014 5:40:53 PM UTC+2, pranav amin wrote: Hi all, While doing some prototyping in ES using SSD's we got some good Write TPS. But the Write TPS saturated after adding some more nodes! Here are the details i used for prototyping - Requirement: To read data as soon as possible since the read is followed by write. Version of ES:1.0.0 Document Size:144 KB Use of SSD for Storage: Yes Benchmarking Tool: Soap UI or Jmeter VM: Ubuntu, 64 Bit OS Total Nodes: 12 Total Shards: 60 Threads: 200 Replica: 2 Index Shards: 20 Total Index:1 Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap Using the above setup we got Write TPS ~= 500. We wanted to know by adding more node if we can increase our Write TPS. But we couldn't. * By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by 10 i.e. ~= 510. * Adding more Hardware like CPU, RAM and increasing Heap didn't help as well [8 CPU, 12 GB RAM, 5 GB Heap]. Can someone help out or point ideas what will be wrong? Conceptually ES should scale in terms of Write Read TPS by adding more nodes. However we aren't able to get that. Much appreciated if someone can point us in the right direction. Let me know if more information is needed. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3470cead-d70a-4dbc-af3c-4b47abce4d40%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to get rid of org.elasticsearch.plugins information logging
Hello, How can i get rid of Jun 16, 2014 10:38:13 AM org.elasticsearch.plugins Information: [Thinker] loaded [], sites [] every time my client connects to ES ? It is not a big problem, but this output is messing up with my shell scripts. I am using transport client if this matters. Is this some log4j configuration ? I am not using log4j atm. Regards, Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/caea4a0b-bff7-4dfe-af92-654ab1a802ea%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch and Hadoop Questions
Please find my comments below : From what you said above that means that you can not run ES queries on data in Hadoop over something like a 6 month time range without it having to pull in all that data and index it first. - *CORRECT *.* Es queries can run only on ES* And I am assuming that the opposite is all correct that Hadoop can not run jobs on data in ES without it first pulling in that data to its storage first. - *NOT CORRECT* The thing is , that you can run MR jobs against data stored in ES (via EsInputFormat) So you can do some realy cool stuff reading(and writing) data form ES and the use the power of MR to process/analyze/dowhateveryouwant the data. In most common case with Hadoop MR job you do the following 1. Job config : input, output, input format, output format , etc 2. Mapper - proces each line of the input (stored on HDFS) and eventualy emit ket/val to Reducer 3. In reducer process all values for one key and eventualy emit again to the output (on HDFS) With Es-hadoop you can set the job input data to be read from ES (so step 1) and then all steps can be the same. I am giving you some typical scenarios : 1. Read(via es query) from ES 1.1 Process the data in a MR job 1.2 Store the output to HDFS [OR Store output to ES again (ESindexing operation)] 2. Run MR job against data stored on HDFS 2.1 Process the data 2.2 Store the output to ES (ES indexing) Cheers Georgi 2014-06-09 13:47 GMT+02:00 ES USER es.user.2...@gmail.com: Thanks. So just one final question. From what you said above that means that you can not run ES queries on data in Hadoop over something like a 6 month time range without it having to pull in all that data and index it first. And I am assuming that the opposite is all correct that Hadoop can not run jobs on data in ES without it first pulling in that data to its storage first. On Friday, June 6, 2014 5:03:03 PM UTC-4, Costin Leau wrote: ES stores data in its own internal format, which typically resides locally. What you are stating is partially correct - with the connector you would move/copy data between Hadoop and ES since, in order for ES to work with data, it needs to actually index it (that is, to see it). So you would use es-hadoop to index data from Hadoop in ES or/and query ES directly from Hadoop. On Fri, Jun 6, 2014 at 9:29 PM, ES USER es.use...@gmail.com wrote: I guess the problem I having wrapping my head around is exactly where the data is residing and in what format. If I understand the Georgi's email above is it that you can run map reduce jobs against data stored in local ES through by utilizing es-hadoop and you can also run ES queries against data in Hadoop utilizing es-hadoop. Is that correct? On Friday, June 6, 2014 12:39:44 PM UTC-4, Costin Leau wrote: Adding to what Georgi wrote, es-hadoop does not create the shards for you - that's up to you or index templates (which I highly recommend). However es-hadoop is aware of the target shards and will use them to parallelize the reads/writes (such as one task per shard). On Fri, Jun 6, 2014 at 2:45 PM, Georgi Ivanov georgi@gmail.com wrote: and i don't think this anyhow related with number of shards and nodes On Thursday, June 5, 2014 7:41:34 PM UTC+2, ES USER wrote: Try as I might and I have read all the stuff I can find on ES' website about this I understand somewhat how the integration works but not the actual nuts and bolts of it. For example: Is Hadoop just storing the files that would normally be stored in the local filesystem for the ES indexes or is it storing the data that would normally be in those indexes and just accessed through es-hadoop? If it is the latter how do you go about determining whatto set for the number of nodes and shards. If anyone has any information on this or even better yet a place to point me to that has better references so that I can research this on my own it would be much appreciated. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/90662a91-1557-4f61-86a2-bd2e620aec6f%40goo glegroups.com https://groups.google.com/d/msgid/elasticsearch/90662a91-1557-4f61-86a2-bd2e620aec6f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/ed729795-a7d6-4320-9da2-16b214e653b0% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch
Re: Elasticsearch and Hadoop Questions
Hmm i am not sure i understand your questions. Hadoop is distributed storage system (HDFS) and Map-reduce framework (MR) (among other things) ES is distributed storage/search system (among other things) So what es-hadoop is giving you: You can read data from ES , and do some complex analysis , taking benefits MR You can write data to ES - one can process some data stored on HDFS and write some pre-aggregated data to ES for example es-hadoop is basically connector between ES and Hadoop I hope this helps On Thursday, June 5, 2014 7:41:34 PM UTC+2, ES USER wrote: Try as I might and I have read all the stuff I can find on ES' website about this I understand somewhat how the integration works but not the actual nuts and bolts of it. For example: Is Hadoop just storing the files that would normally be stored in the local filesystem for the ES indexes or is it storing the data that would normally be in those indexes and just accessed through es-hadoop? If it is the latter how do you go about determining whatto set for the number of nodes and shards. If anyone has any information on this or even better yet a place to point me to that has better references so that I can research this on my own it would be much appreciated. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f4019b07-a660-4a49-b9ec-b04bb1ad71e5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Unable to get document by id
Hi, Something strange here .. I can find a document when searching for it , but i can not get it by ID For example : { query: { bool: { must: [ { term: { position.ship_id: 50132}} { term: { ts: 138524314}}] should: [ ]}} } - - Result is OK: { took: 5 timed_out: false _shards: { total: 8 successful: 8 failed: 0} hits: { total: 1 max_score: 1.4142135 hits: [ { _index: track_201311 _type: position _id: 50132_138524314_-1_5.4194833_57.402333 _score: 1.4142135 _source: { hourly: true ts: 138524314 ship_id: 50132 }}]}} But when i try this curl 'http://localhost:9200/track_201311/position/50132_138524314_-1_5.4194833_57.402333' I get this {_index:track_201311,_type:position,_id: 50132_138524314_-1_5.4194833_57.402333,found:false} I think this started when i upgraded to ES 1.2 Any idea what is going on ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1a33662e-9d90-4aa7-a53f-f0c57194578e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
cluster.routing.allocation.cluster_concurrent_rebalance not respected?
Hi, In elasticsearch.yml i have : cluster.routing.allocation.cluster_concurrent_rebalance : 6 still i see curl http://localhost:9200/_cat/health?v epoch timestamp cluster status node.total node.data shards pri relo init unassign 131043 16:24:03 mycluster green 8 8676 338 200 Number of relocating shards sometimes goes up to 3 , but i never see it goes to 6 Am i missing something here ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0589a8ad-5352-4ecf-8b56-faeb10ef78a6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch primary shards (re)location
Hi, I have the following situation : I have 8 node cluster. Periodically some nodes are restarted, and their primary shards allocated on other nodes. After the node is back, it contains much less primary shards then the rest of the nodes. Now i have a situation, where one node holds many primary shards, while other node hold only few . For example i have a node with 20+ primary shards, and a node with 3 primary shards. Is this a problem ? What will happen when the node with many primary shards fails ? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69bbaf9e-4713-477a-8e1c-fde89f941c69%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Difference between geo_point and geo_shape (point)
Thanks Alex, That makes perfect sense. For now I am sticking with geo_shape type here . Except the index size , everything is much smoother here. I could recommend geo_shape if one needs geo queries all the time (like me) George 2014-03-31 9:09 GMT+02:00 Alexander Reelsen a...@spinscale.de: Hey, this is all about storing and computing. First, lets take a look at geo_point * Index: Is stored as two floats lat/lon in the index * Query: All geo points are loaded into memory (thus your big fielddata) and then in memory calculations are executed Now the geo_shape * Index: The shape is converted into terms and then stored in the index (thus your big index size) * Query: A full-text search is basically used to check if a shape is inside of another (do they include the same terms?) Possible speed improvements: * geo_point: Use warmer APIs * geo_point: Maybe caching helps, your query location is always the same. * geo_point: Maybe the geo_hash_cell filter helps you in terms of speed (needs a special mapping) * geo_shape: Less precision, less index size, you can change that in the mapping At the end of day you are meeting a classic tradeoff here. Are willing to use more disk or are you willing to compute more things on query time? Hope it makes sense as a quick intro... --Alex On Wed, Mar 19, 2014 at 9:42 PM, Georgi Ivanov georgi.r.iva...@gmail.comwrote: Hi, I am indexing some pretty big ammount of positions in ES (like 150M ) in monthy based indexes (201312 , 201311 etc) One document has a timestamp and location. My queries are like : Give me all positions inside this boundig box... etc I have 2 types of indexes with exaclty the same mapping except the location fields. Ex: loc: { type: geo_point } loc: { tree: quadtree type: geo_shape } It seems to me that there is big difference in the speed of the queries agains the two types of indexes. The index with location of type geo_shape is MUCH faster that the index with geo_point. With cold caches the query with geo_point runs for aout 26 seconds , where the query with geo_shape runs for like 2 seconds. Also the query with geo_point type loads huge ammount of data in field cache (8GB for just one month data). With geo_shape field data is much less. The geo_shape mapping is with default precision and qudtree type. Both queries have the same logic. I would like to undestand why it is much fatser with geo_shape than geo_point. Can someone shade some light on this matter ? Ofc the index with geo_shape is like 30% bigger in size. Example query for index type geo_shape { query: { bool: { must: [ { range: { ts: { from: 2013-11-01, to: 2013-12-30 } } }, { geo_shape: { loc: { shape: { type: envelope, coordinates: [ [ 1.6754645,53.786], [14.345234, 51.3453 ] ] } } } } ], } }, aggregations: { agg1: { terms: { field: e_id } } }, size: 0 } Example query for index type geo_point { query: { bool: { must: [ { range: { ts: { from: 2013-11-01, to: 2013-12-30 } } }, { geo_bounding_box : { loc : { top_left : { lat : 40.73, lon : -74.1 }, bottom_right : { lat : 40.01, lon : -71.12 } } } } ], } }, aggregations: { agg1: { terms: { field: e_id } } }, size: 0 } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/GYPrniLiJis/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https
Re: elasticsearch java interaction
I still see port 9200. Several times we said this must be 9300 . As master Yoda would say : Concentrate you must ! :)) On Friday, March 14, 2014 1:47:02 PM UTC+1, Venu Krishna wrote: Hi, I am Y.Venu,i am totally new to this elasticsearch,now i am trying to communicate java elastisearch,i have gone through the elasticsearch java api's 1st i came across maven repositry. i have created pom.xml in my eclipse and in the dependency tag i have just placed the code that i found in maven repositry i.e. dependency groupIdorg.elasticsearch/groupId artifactIdelasticsearch/artifactId version${es.version}/version /dependency After that i have created one class with the main method and i copied and placed the code that i found in the client api of elasticsearch i.e. TransportClient. main() { Client client = new TransportClient() .addTransportAddress(new InetSocketTransportAddress(host1, 9200)) .addTransportAddress(new InetSocketTransportAddress(host2, 9200)); // on shutdown client.close(); Settings settings = ImmutableSettings.settingsBuilder() .put(client.transport.sniff, true).build(); TransportClient client1 = new TransportClient(settings); } After running this app javapplication,i am getting the errors like this In Main Method Mar 14, 2014 6:05:24 PM org.elasticsearch.node INFO: [Mister Machine] {elasticsearch/0.16.1}[11016]: initializing ... Mar 14, 2014 6:05:24 PM org.elasticsearch.plugins INFO: [Mister Machine] loaded [] org.elasticsearch.common.inject.internal.ComputationException: org.elasticsearch.common.inject.internal.ComputationException: java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock; at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553) at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419) at org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041) at org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46) at org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:52) at org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:57) at org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:377) at org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:169) at org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:224) at org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:120) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:105) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:92) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:69) at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:58) at org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:146) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159) at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166) at ES_Client.main(ES_Client.java:64) Caused by: org.elasticsearch.common.inject.internal.ComputationException: java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock; at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553) at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419) at org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041) at org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46) at org.elasticsearch.common.inject.MembersInjectorStore.get(MembersInjectorStore.java:66) at org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:69) at org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:31) at org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:39) at org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:35) at org.elasticsearch.common.inject.internal.FailableCache$1.apply(FailableCache.java:35) at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:549) ... 17 more Caused by: java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
Re: elasticsearch java interaction
There is something wrong with your set-up How many ES node s you have ? On which IP addresses are ES hosts listening ? I understood you have 2 hosts , but it seems you have only one on your local machine . This is the code (a bit modified) I am using at the moment public void connectES() { SetString hosts = new HashSetString(); hosts.add(host1.mydomain.com); hosts.add(host2.host1.mydomain.com); // Make sure this resolvs to proper IP address Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, vesseltrackerES).build(); TransportClient transportClient = new TransportClient(settings); for (String host : this.hosts) { transportClient = transportClient.addTransportAddress(new InetSocketTransportAddress(host, 9300)); } System.out.print(Connected to nodes : ); for (DiscoveryNode node : transportClient.connectedNodes()) { System.out.print(node.getHostName() + , ); } System.out.println(); this.client = (Client) transportClient; } On Thursday, March 20, 2014 2:51:50 PM UTC+1, Venu Krishna wrote: Actually this is my elasticsearch index http://localhost:9200/, as you told i have replaced 9200 with 9300 in the above code ,then i executed the application i am getting following exceptions. Mar 20, 2014 7:17:45 PM org.elasticsearch.client.transport WARNING: [Bailey, Gailyn] failed to get node info for [#transport#-1][inet[localhost/127.0.0.1:9300]] org.elasticsearch.transport.NodeDisconnectedException: [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected Connected Mar 20, 2014 7:17:50 PM org.elasticsearch.client.transport WARNING: [Bailey, Gailyn] failed to get node info for [#transport#-1][inet[localhost/127.0.0.1:9300]] org.elasticsearch.transport.NodeDisconnectedException: [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected Mar 20, 2014 7:17:50 PM org.elasticsearch.client.transport WARNING: [Bailey, Gailyn] failed to get node info for [#transport#-1][inet[localhost/127.0.0.1:9300]] org.elasticsearch.transport.NodeDisconnectedException: [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected Thankyou On Thursday, March 20, 2014 7:12:14 PM UTC+5:30, David Pilato wrote: Use port 9300 -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 20 mars 2014 à 14:34, Venu Krishna yvgk...@gmail.com a écrit : Thankyou for the reply.I am not getting any errors,but i am not able to connect to my elasticsearch using java.Here my code. import java.net.InetSocketAddress; import org.elasticsearch.client.Client; import org.elasticsearch.client.transport.TransportClient; import org.elasticsearch.common.transport.InetSocketTransportAddress; public class JavaES_Client { void function() { //on StartUp System.out.println(In Function); Client client = new TransportClient() .addTransportAddress(new InetSocketTransportAddress(localhost, 9200)); // This is where my control is getting stuck,without any exceptions or errors. System.out.println(Connected); //on ShutDown client.close(); } public static void main(String[] args) { System.out.println(In Main Method); JavaES_Client jc = new JavaES_Client(); System.out.println(Object Created); jc.function(); } } On Thursday, March 20, 2014 2:20:25 PM UTC+5:30, Georgi Ivanov wrote: On Linux the file is /etc/hosts On Windows c:\windows\system32\drivers\etc\hosts Open the file in text editor Add following lines: 192.168.1.100 host1 192.168.1.101 host2 Make sure that 192.168.1.100/101 is the right IP address of the host1/host2 2014-03-20 8:35 GMT+01:00 Venu Krishna yvgk...@gmail.com: Hi Georgi Ivanov, yes,i am able to understand the Exception i.e. UnresolvedAddressException,but you are telling that to make sure host1 and host2 are resolved by adding entries to /etc/hosts to wherever the file in on Windows,for this can you give me the steps how to approach this.Sorry i am new to this and am learning i am unable to get the proper example.Thanks in advance for the help. On Thursday, March 20, 2014 2:36:10 AM UTC+5:30, Georgi Ivanov wrote: Well I think UnresolvedAddressException obviously means that your Java client can not resolve host1 and host2 make sure host1 and host2 are resolvable by adding entries to /etc/hosts ot wherever the file in on Windows On Friday, March 14, 2014 1:47:02 PM UTC+1, Venu Krishna wrote: Hi, I am Y.Venu,i am totally new to this elasticsearch,now i am trying to communicate java elastisearch,i have gone through the elasticsearch java api's 1st i came across maven repositry. i have created pom.xml in my eclipse and in the dependency tag i have just placed the code that i found in maven repositry
Re: Sort before filter?
I think sorting first, will be bad if u have more data. Sorting is not exaclty the fasted thinkg .. It may sound good for small amount of data, but what if we have 10 B documents ? Should ES go trought all documents just to sort them ? I don't think this will be good. On Wednesday, March 19, 2014 12:45:43 PM UTC+1, David Pfeffer wrote: I have an index that contains 30 GB worth of news stories. I want to return the stories that contain a particular name in their text, sorted chronologically. I only want the first 100 stories. ElasticSearch seems to approach this problem by filtering every story to just those that match, then sorting those results and returning the top 100. This uses a reasonably large amount of resources to filter every single one. Can I get ElasticSearch to instead sort first, and then filter in order until it reaches the maximum (100). Granted that this would be 100 per shard, but then the final step would be to take each shard's 100, sort them all together, and take the top 100 of that result set. This should, at least in my mind, use significantly less resources, as it would only need to go through maybe 5000 or 1 items to find a match, as opposed to the entirety of the index. *(Cross-posted from http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch, because I didn't get an answer there for 2 days.)* -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/710176fc-2b8a-4046-b27a-7e25457f026c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch java interaction
Well I think UnresolvedAddressException obviously means that your Java client can not resolve host1 and host2 make sure host1 and host2 are resolvable by adding entries to /etc/hosts ot wherever the file in on Windows On Friday, March 14, 2014 1:47:02 PM UTC+1, Venu Krishna wrote: Hi, I am Y.Venu,i am totally new to this elasticsearch,now i am trying to communicate java elastisearch,i have gone through the elasticsearch java api's 1st i came across maven repositry. i have created pom.xml in my eclipse and in the dependency tag i have just placed the code that i found in maven repositry i.e. dependency groupIdorg.elasticsearch/groupId artifactIdelasticsearch/artifactId version${es.version}/version /dependency After that i have created one class with the main method and i copied and placed the code that i found in the client api of elasticsearch i.e. TransportClient. main() { Client client = new TransportClient() .addTransportAddress(new InetSocketTransportAddress(host1, 9200)) .addTransportAddress(new InetSocketTransportAddress(host2, 9200)); // on shutdown client.close(); Settings settings = ImmutableSettings.settingsBuilder() .put(client.transport.sniff, true).build(); TransportClient client1 = new TransportClient(settings); } After running this app javapplication,i am getting the errors like this In Main Method Mar 14, 2014 6:05:24 PM org.elasticsearch.node INFO: [Mister Machine] {elasticsearch/0.16.1}[11016]: initializing ... Mar 14, 2014 6:05:24 PM org.elasticsearch.plugins INFO: [Mister Machine] loaded [] org.elasticsearch.common.inject.internal.ComputationException: org.elasticsearch.common.inject.internal.ComputationException: java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock; at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553) at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419) at org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041) at org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46) at org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:52) at org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:57) at org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:377) at org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:169) at org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:224) at org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:120) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:105) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:92) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:69) at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:58) at org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:146) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159) at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166) at ES_Client.main(ES_Client.java:64) Caused by: org.elasticsearch.common.inject.internal.ComputationException: java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock; at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553) at org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419) at org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041) at org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46) at org.elasticsearch.common.inject.MembersInjectorStore.get(MembersInjectorStore.java:66) at org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:69) at org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:31) at org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:39) at org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:35) at org.elasticsearch.common.inject.internal.FailableCache$1.apply(FailableCache.java:35) at
Re: Is this stacktrace a reason for cluster instability?
For me this exceptino is just saying that the ES couldnt convert : xxx-hdp13 to Date object It sounds like just incorrect query . So i dont think this is the reason for your troubles. Nore that the message is with severity DEBUG , and not Error or CRIT On Wednesday, March 19, 2014 11:17:23 AM UTC+1, Jelle Smet wrote: Hi List, I'm running ES 1.0.1 in a 6 node configuration. Some days ago we have experienced instability issues with our cluster. At a certain moment no documents could be indexed anymore. After restarting the indexing processes (Logstash), indexing worked again for a short period of time only to stall again after a brief period. The ES cluster had to be restarted to restore normal behavior. After that, some (recent) index shards stayed unassigned. I dropped the replication value for those indexes to 0 in order to clear the unassigned shards. Enabling replication resulted again into unassigned shards for the impacted indexes. I have left the replication level to 0 for the troublesome indexes in order to clear the cluster status. Meanwhile, indexing and replication works again for any newly created indexes. What we noticed: - The logs didn't reveal any immediate cause. The only reported issue we saw in the logs prior to the incident is the below mentioned stack trace. Afterwards, we have countered this issue by creating a template which enforces the offending field to be treated as a string, which should prevent the below mentioned error. - Our ES collected metrics revealed a sudden drop of total number of Java threads at the exact same moment. My question: Could the below mentioned stracktrace be the cause of any cluster instability? Thanks, Jelle [2014-03-15 03:15:38,414][DEBUG][action.bulk ] [-xxx-logs-001] [logstash-2014.03.15][0] failed to execute bulk item (index) index {[logstash-2014.03.15][logs][MBzzcesjQTSjUBhryTAgzQ], source[{@source:tcp://:0:0:0:0:0:0:1:60186/,@tags:[],@fields:{timestamp:[Mar 15 04:15:37],logsource:[xxx-sss01],program:[snmptrapd],snmptrapsource:[xxx-hdp04],snmptrapseverity:[INFORMATIONAL],message:[CLI/Telnet user logout: iRMC S2 CLI/Telnet user '' logout from xxx.xxx.xxx.xxx]},@timestamp:2014-03-15T03:15:37.931+00:00,@message:13Mar 15 04:15:37 xxx-sss01 snmptrapd: xxx-hdp04 INFORMATIONAL CLI/Telnet user logout: iRMC S2 CLI/Telnet user '' logout from xxx.xxx.xxx.xxx\n,@type:snmptrapd,@collector:[xxx-sss01.xxx.xx],@version:1}]} org.elasticsearch.index.mapper.MapperParsingException: failed to parse [@fields.snmptrapsource] at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418) at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616) at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604) at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461) at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517) at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462) at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:400) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [xxx-hdp04], tried both date format [dateOptionalTime], and timestamp number with locale [] at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:582) at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:510) at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215) at
Re: Sort before filter?
I don't know what kind of problems you have . You may try to post your mappings, number of documents , index count, sever count, server configuration (memory ?) etc.. here and we can try to think something. 30Gb doesnt sound so much for ES On Wednesday, March 19, 2014 12:45:43 PM UTC+1, David Pfeffer wrote: I have an index that contains 30 GB worth of news stories. I want to return the stories that contain a particular name in their text, sorted chronologically. I only want the first 100 stories. ElasticSearch seems to approach this problem by filtering every story to just those that match, then sorting those results and returning the top 100. This uses a reasonably large amount of resources to filter every single one. Can I get ElasticSearch to instead sort first, and then filter in order until it reaches the maximum (100). Granted that this would be 100 per shard, but then the final step would be to take each shard's 100, sort them all together, and take the top 100 of that result set. This should, at least in my mind, use significantly less resources, as it would only need to go through maybe 5000 or 1 items to find a match, as opposed to the entirety of the index. *(Cross-posted from http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch, because I didn't get an answer there for 2 days.)* -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/144aee16-9949-44a2-8a56-6b1f1b2f81fa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
slow indexing geo_shape
Hi, I am playing with geo_shape type . I am experiencing very slow indexing times. For example one simple linestring with couple of hundred points could take up to 60 seconds to index. I tries geohash and quadtree implementations. With quadtree it is faster (like 50% faster) , but still not fast enough. Using Java API (bulk indexing) Mapping: { entity: { properties: { id : {type: integer}, track : {type: geo_shape,precision:20m, tree: quadtree}, date : {type: date} } } } My ES cluster is tuned for indexing like follows: index.refresh_interval: 30s index.translog.flush_threshold_ops: 10 indices.memory.index_buffer_size:: 15% threadpool.bulk.queue_size: 500 threadpool.bulk.size: 100 threadpool.bulk.type: fixed Any tips how to make indexing faster ? My estimation is that for one day data i could index it for 10 hours (and i need to index like 3 years of data ). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb6b6f20-639c-4bb1-93d9-52f81658761c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.