Re: stuck thread problem?
FYI, this turned out to be a real bug. A fix has been committed and will be included in the next release. On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote: I did report it https://github.com/elasticsearch/elasticsearch/issues/7478 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/04d9c094-112d-4d7d-bd48-e4fa2ff3a774%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Explicitly Copying Replica Shards That Fail to Start
Thank you Mark! Setting { index : { number_of_replicas : 0 } } and then back to 1 cleared the bad replicas and rebuilt them from primaries. Much appreciated, David On Thursday, August 28, 2014 3:53:32 PM UTC-7, Mark Walkom wrote: Yep, the easiest way is to drop the replica and then add it back and see how you go. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 29 August 2014 08:40, David Kleiner david@gmail.com javascript: wrote: Greetings, I am still having a problem with recovery of 5 replica shards in 2 indices of mine, 3-way cluster. The replica shards fail to initialize and are jumping around two secondary nodes. The primary shards are fine. What is my path to recovery? Is copying master shard to secondary nodes a correct way? I tried issuing routing commands to cancel recovery/allocation, it helped with some secondary shards but not with the 5 in question. I also tried dumping index with failing secondary shards but two nodes crashed (well, lost connection to cluster) so dump failed. Would setting replica # to 0, copying masters to 2 nodes and setting replica # to 1 a viable alternative? Thank you, David -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a2482a81-5be8-4ed2-ad43-e37330446376%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: stuck thread problem?
Thank you! On 29 août 2014, at 08:49, Martin Forssen m...@recordedfuture.com wrote: FYI, this turned out to be a real bug. A fix has been committed and will be included in the next release. On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote: I did report it https://github.com/elasticsearch/elasticsearch/issues/7478 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DFFDD1A6-9F76-4AC0-8211-95C47CC5CAC7%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: I search same thing, but once can get and once can not get???
I can sure there is only one master node exists and 16 nodes work as one cluster. I think I know what it happened. I found that when successive requests, elasticsearch will execute query by once primary shards once replica shards at default configuration. But what I can not understand is why the primary shads and the replica shards given different result at the same time point? This happened when I index some new documents but not refresh, if I refresh the cluster, then the primary shards and the replica shards will give the same result. On Thursday, August 28, 2014 6:41:22 PM UTC+8, Greg Murnane wrote: This is a symptom that could happen with bad GC events, or with split brain. Can you look at the GC logging output to see how long the stop the world pauses you're seeing are? You can also run a query like curl -XGET ' http://localhost:9200/_cluster/state/master_node?local=true' on each of the nodes to make sure that they agree on which one is the master node. Look also at wait CPU and disk utilization when you run a query. Unless you have a physical disk for each node on this system, it's likely that there can be IO contention with 16 nodes querying the disks. If all that looks ok, if you are running replicas, then you can try pulling out a replica and an original, and loading them into an isolated ES node on another system, and query there. It's possible that some of the replicas could be corrupted, and this would allow you to detect that. - Out of curiousity, though, I wonder what the purpose of running so many nodes on a single machine is. ES is very effective at using the entire CPU with only one node, and replicating your heap size 16 times, adding IO contention, and splitting the cache 16 ways all seem like they would hurt performance immensely. The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d86c4ac0-910f-4d00-8276-bec8aed220ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
help needed scripting update to list element (bulk request)
Hi Say I have a list of elements like this: PUT twitter/twit/1 { list: [ { a: b, c: d, * e: f* }, { 1: 2, 3: 4 } ] } And I want to change the value of e (currently f) to say new_f such that the document looks like: { list: [ { a: b, c: d, * e: new_f* }, { 1: 2, 3: 4 } ] } Is there a way to do this ? Maybe in MVEL ? Do I match on document { a: b, c: d, e: f } ie if list.contains(document) { some kind of update; } // is this possible ? I know MVEL is being deprecated in 1.4 however it will do for now. I want to use bulk request. I know it's possible to remove the element like this: bulkRequestBuilder.setScript(if (ctx._source.list.contains(document)) {ctx._source.list.remove(document)} }).setScriptParams etc but is it possible to update a field in the document also ? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da238495-7cf3-4215-a77e-2144499b8859%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Bulk UDP API
I'm trying to index data using the bulk UDP API on a single node Elasticsearch 1.3.2. In my elasticsearch config I have bulk.udp.enabled: true My bulk file has 85000 documents and has the following characteristics: bart@hp-g7-02:~/git/data$ ls -al mydata.json -rw-rw-r-- 1 bart bart 97818287 Aug 28 15:43 mydata.json bart@hp-g7-02:~/git/data$ wc -l mydata.json 170001 mydata.json bart@hp-g7-02:~/git/data$ file mydata.json mydata.json: UTF-8 Unicode English text, with very long lines Indexing the data using the bulk API described at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html works. I see the documents in my elasticsearch store once the bulk upload is finished. However, if I use the same bulk file and try to index it using the command cat mydata.json | nc -w 0 -u localhost 9700 then only 1 document gets indexed, and I see lots of parsing errors like the following in my log files: [2014-08-29 11:28:41,649][WARN ][bulk.udp ] [Mysterio] failed to execute bulk request org.elasticsearch.common.jackson.core.JsonParseException: Unrecognized token '_index': was expecting ('true', 'false' or 'null') at [Source: [B@656f95ce; line: 1, column: 15] at org.elasticsearch.common.jackson.core.JsonParser._constructError( JsonParser.java:1419) at org.elasticsearch.common.jackson.core.base.ParserMinimalBase. _reportError(ParserMinimalBase.java:508) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser. _reportInvalidToken(UTF8StreamJsonParser.java:3201) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser. _handleUnexpectedValue(UTF8StreamJsonParser.java:2360) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser. _nextTokenNotInObject(UTF8StreamJsonParser.java:794) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser. nextToken(UTF8StreamJsonParser.java:690) at org.elasticsearch.common.xcontent.json.JsonXContentParser. nextToken(JsonXContentParser.java:50) at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java: 266) at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor. java:256) at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor. java:252) at org.elasticsearch.bulk.udp.BulkUdpService$Handler.messageReceived (BulkUdpService.java:181) at org.elasticsearch.common.netty.channel. SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler. java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels. fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.socket.nio. NioDatagramWorker.read(NioDatagramWorker.java:98) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio. NioDatagramWorker.run(NioDatagramWorker.java:343) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run( ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal. DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I find it strange that things work using the usual bulk API, but not with the bulk UDP API. Am I overlooking something or doing something wrong? Thanks, Bart -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Replica assignement on the same host
Hi, I have an ES cluster with 12 data nodes spread on 6 servers (so 2 nodes per server) and I saw that replicas of a shard can be allocated on the same server(on each nodes hosted by a server) To avoid this I haveset those parameters to the cluster: node.host: server_name cluster.routing.allocation.awareness.attributes: zone, host But I'm wondering if there are not a specific parameter for this instead of using clustering awareness allocation? Nicolas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Replica assignement on the same host
That's the best method as per http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 29 August 2014 20:45, 'Nicolas Fraison' via elasticsearch elasticsearch@googlegroups.com wrote: Hi, I have an ES cluster with 12 data nodes spread on 6 servers (so 2 nodes per server) and I saw that replicas of a shard can be allocated on the same server(on each nodes hosted by a server) To avoid this I haveset those parameters to the cluster: node.host: server_name cluster.routing.allocation.awareness.attributes: zone, host But I'm wondering if there are not a specific parameter for this instead of using clustering awareness allocation? Nicolas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YaxHLT%3DsptzqcSQw3i9u9oozO_2DstFJ6vCs-VC_bzOw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: stuck thread problem?
Hi Patrick, Did you see the same stuck thread via jstack or the hot thread api that Martin reported? This can only happen if scan search was enabled (by setting search_type=scan in a search request) If that isn't the case then something else is maybe stuck. Martijn On 29 August 2014 09:58, Patrick Proniewski elasticsea...@patpro.net wrote: Thank you! On 29 août 2014, at 08:49, Martin Forssen m...@recordedfuture.com wrote: FYI, this turned out to be a real bug. A fix has been committed and will be included in the next release. On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote: I did report it https://github.com/elasticsearch/elasticsearch/issues/7478 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DFFDD1A6-9F76-4AC0-8211-95C47CC5CAC7%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty1RLHNButgkgYZ3pt_L0ygtonn7y8QpM%3D-0ttC%2BM84gQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch template to use standard analyzer but addional token_filter word_delimiter
Hi, I am using logstash and elasticsearch for log analysis. The standard analyzer does a pretty good job; however, it will not split things like word1.word2. Therefore, I want to add the token_filter word_delimiter. How would such an additional logstash template look? Also how to limit this addition just to certain fields? Thx Marc -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2009958-6ab5-49f0-8027-f3259289442c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: stuck thread problem?
Hi Patrick, I this problem happens again then you should execute the hot threads api: curl localhost:9200/_nodes/hot_threads Documentation: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html#cluster-nodes-hot-threads Just pick a node in your cluster and run that command. This is the equivalent of running jstack on all the nodes in your cluster. Martijn On 29 August 2014 13:34, Patrick Proniewski elasticsea...@patpro.net wrote: Hi, I don't know how to debug a JAVA process. Haven't heard about jstack until it was mentioned in this thread. All I know is what I've posted in my first message. I've restarted ES, and currently I've no stuck thread to investigate. In the mean time, you can teach me how I should use jstack, so next time it happens I'll be ready. On 29 août 2014, at 13:19, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: Hi Patrick, Did you see the same stuck thread via jstack or the hot thread api that Martin reported? This can only happen if scan search was enabled (by setting search_type=scan in a search request) If that isn't the case then something else is maybe stuck. Martijn On 29 August 2014 09:58, Patrick Proniewski elasticsea...@patpro.net wrote: Thank you! On 29 août 2014, at 08:49, Martin Forssen m...@recordedfuture.com wrote: FYI, this turned out to be a real bug. A fix has been committed and will be included in the next release. On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote: I did report it https://github.com/elasticsearch/elasticsearch/issues/7478 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DFFDD1A6-9F76-4AC0-8211-95C47CC5CAC7%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty1RLHNButgkgYZ3pt_L0ygtonn7y8QpM%3D-0ttC%2BM84gQ%40mail.gmail.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34529AB7-AD03-404F-9787-60BD6B90E1A4%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TxbSpXgVwRmfF5X3%2BDzsWgz8iNgaTjWXOP7iT1NdfHLow%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
which class file trigger writing of segments.gen / segments_1
Hello people, Anybody know which class/component in elastic search trigger writing of segments.gen and segments_1? I'm currently using elastic search version 1.2.1. It would be great if you can provide link pin point which line in the class does that. Thank you. /Jason -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: which class file trigger writing of segments.gen / segments_1
This is Lucene, when indexing starts. Look at the SegmentsInfo class https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/index/SegmentInfos.html Jörg On Fri, Aug 29, 2014 at 2:38 PM, Jason Wee peich...@gmail.com wrote: Hello people, Anybody know which class/component in elastic search trigger writing of segments.gen and segments_1? I'm currently using elastic search version 1.2.1. It would be great if you can provide link pin point which line in the class does that. Thank you. /Jason -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3Ddauci86-7Y-RaN%2BJW94kqXU%3DwTA3kgxLO5Mj%3DLL0aQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Bulk UDP API
Maybe it is the line feeds in mydata.json, probably you are not using UNIX LFs with single \n ? Jörg On Fri, Aug 29, 2014 at 11:36 AM, Bart Vandewoestyne bart.vandewoest...@gmail.com wrote: I'm trying to index data using the bulk UDP API on a single node Elasticsearch 1.3.2. In my elasticsearch config I have bulk.udp.enabled: true My bulk file has 85000 documents and has the following characteristics: bart@hp-g7-02:~/git/data$ ls -al mydata.json -rw-rw-r-- 1 bart bart 97818287 Aug 28 15:43 mydata.json bart@hp-g7-02:~/git/data$ wc -l mydata.json 170001 mydata.json bart@hp-g7-02:~/git/data$ file mydata.json mydata.json: UTF-8 Unicode English text, with very long lines Indexing the data using the bulk API described at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html works. I see the documents in my elasticsearch store once the bulk upload is finished. However, if I use the same bulk file and try to index it using the command cat mydata.json | nc -w 0 -u localhost 9700 then only 1 document gets indexed, and I see lots of parsing errors like the following in my log files: [2014-08-29 11:28:41,649][WARN ][bulk.udp ] [Mysterio] failed to execute bulk request org.elasticsearch.common.jackson.core.JsonParseException: Unrecognized token '_index': was expecting ('true', 'false' or 'null') at [Source: [B@656f95ce; line: 1, column: 15] at org.elasticsearch.common.jackson.core.JsonParser. _constructError(JsonParser.java:1419) at org.elasticsearch.common.jackson.core.base.ParserMinimalBase. _reportError(ParserMinimalBase.java:508) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser ._reportInvalidToken(UTF8StreamJsonParser.java:3201) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser ._handleUnexpectedValue(UTF8StreamJsonParser.java:2360) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser ._nextTokenNotInObject(UTF8StreamJsonParser.java:794) at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser .nextToken(UTF8StreamJsonParser.java:690) at org.elasticsearch.common.xcontent.json.JsonXContentParser. nextToken(JsonXContentParser.java:50) at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java: 266) at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor. java:256) at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor. java:252) at org.elasticsearch.bulk.udp.BulkUdpService$Handler. messageReceived(BulkUdpService.java:181) at org.elasticsearch.common.netty.channel. SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler. java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels. fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.socket.nio. NioDatagramWorker.read(NioDatagramWorker.java:98) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio. NioDatagramWorker.run(NioDatagramWorker.java:343) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run( ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal. DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I find it strange that things work using the usual bulk API, but not with the bulk UDP API. Am I overlooking something or doing something wrong? Thanks, Bart -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups
Does transport client do scatter gather?
Just as the subject asks or only the node client can do scatter gather? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Refactoring idea for buildShardFailures()?
Hi Sir/Madam, I'm doing some research in automatic refactoring suggestion. By observing the co-change pattern of some similar code, we would like to develop a tool to suggest possible refactorings to apply in order to extract out common code while parameterizing any difference between them. I have examined the code snippets in class org.elasticsearch.action.search.type.TransportSearchScrollScanAction.AsyncAction, org.elasticsearch.action.search.type.TransportSearchScrollQueryAndFetchAction.AsyncAction, and org.elasticsearch.action.search.type.TransportSearchScrollQueryThenFetchAction.AsyncAction. I notice that all of the three classes have method buildShardFailures() defined. The method bodies are pretty similar and they experience similar or same changes at least once in the version history. Do you think it is a good idea or bad idea to extract a method out of the methods? No matter whether you would like to extract a method or not, would you like to share the factors in your mind which affect your decision, such as complexity of refactoring, poor readability, poor maintainability, etc.? For each factor, how do you think it can affect your decision about using refactoring? If possible, any quantative analysis will be great. For example, if the code size after refactoring is greater than that before refactoring, I won't do refactoring. Or if there are only two lines shared between two code snippets, I won't do refactoring, etc. Thanks a lot for your help! Your suggestion will be very valuable for our research. Best regards, Na Meng -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24d9494b-c514-477b-8096-ae6dec8ca638%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
Thanks again and sorry to bother you guys but I'm new to Github and don't know what do do from here. Can you point me to the right place where I can take the next step to put this patch on my server? I only know how to untar the tarball I downloaded from the main ES page. Thanks. Tony On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote: Kudos! Tony On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote: All praise should go to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety. Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote: Amazing job. Great work. -- Ivan On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com joerg...@gmail.com wrote: I fixed the issue by setting the safe LZF encoder in LZFCompressor and opened a pull request https://github.com/elasticsearch/elasticsearch/pull/7466 Jörg On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com joerg...@gmail.com wrote: Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com joerg...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : { index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true, path: full, properties : { location : { type : geo_point } } } } } }} On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote: I have no plugins installed (yet) and only changed es.logger.level to DEBUG in logging.yml. elasticsearch.yml: cluster.name: es-AMS1Cluster node.name: KYLIE1 node.rack: amssc2client02 path.data: /export/home/apontet/elasticsearch/data path.work: /export/home/apontet/elasticsearch/work path.logs: /export/home/apontet/elasticsearch/logs network.host: = sanitized line; file contains actual server IP discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7] = Also sanitized Thanks, Tony On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote: I tested a simple Hello World document on Elasticsearch 1.3.2 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings. No issues. So I would like to know more about the settings in elasticsearch.yml, the mappings, and the installed plugins. Jörg On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com wrote: I have some Solaris 10 Sparc V440/V445 servers available and can try to reproduce over the weekend. Jörg On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir rober...@elasticsearch.com wrote: How big is it? Maybe i can have it anyway? I pulled two ancient ultrasparcs out of my closet to try to debug your issue, but unfortunately they are a pita to work with (dead nvram battery on both, zeroed mac address, etc.) Id still love to get to the bottom of this. On Aug 22, 2014 3:59 PM, tony@iqor.com wrote: Hi Adrien, It's a bunch of garbled binary data, basically a dump of the process image. Tony On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote: Hi Tony, Do you have more information in the core dump file? (cf. the Core dump written line that
Re: Stop words and Keyword tokenizer
Thanks Ivan! I'll test which way fits better to my needs. 2014-08-28 17:12 GMT-05:00 Ivan Brusic i...@brusic.com: Character filters are executed before the tokenizer, so only something in that family of filters would work if you plan to continue using the keyword tokenizer. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html The mapping char filter might be a better match if you list is not in regex form. I use the mapping char filter to remove copyright, trademark and a whole list of other characters from my content. Cheers, Ivan On Thu, Aug 28, 2014 at 2:33 PM, Germán Carrillo carrillo.ger...@gmail.com wrote: Ivan, yes, I'm aware I would obtain another text, that's fine. Even more, my docs have a display field to be returned to users after a search. For the example given above, the display value would be something like: Mulaló, Yumbo, Valle del Cauca. Itamar, I've actually considered several options. I think a synonym file would be too big. I gave you 11 equivalent terms (you might've noticed I could have continued to give you around 30 equivalent ways), but I didn't mention place names (alone) have their corresponding synonyms, alternate names, abbreviations, and vernacular names. There could be 10k different places (docs) in the index. :D Also, taking into account every single case into the synonym file seems to be sub-optimal. Really, I intend to normalize a large number of ways of expressing place hierarchy into a few ways. Otherwise I'd have to build very large lists for each place I add to the index, and nothing prevents I'm missing a weird case. BTW, handling hierarchy is a must, otherwise result disambiguation would be a nightmare for users. Thanks for all the discussion, it's certainly valuable to read an expert's opinion. Back to my very first question, is the pattern replace token filter the only way to remove stop words from tokens obtained from a keyword tokenizer? Are those regular expressions not very performant? 2014-08-28 15:49 GMT-05:00 Ivan Brusic i...@brusic.com: You mentioned in your original post I'd like to obtain the original text without stop words The stopword-less phrase will indeed be present in the index after the analysis phrase, however, when you ask for this content back as a result of a query, the original text will be returned. What is indexed is not necessarily what is stored/returned. Cheers, Ivan On Thu, Aug 28, 2014 at 12:30 PM, Germán Carrillo carrillo.ger...@gmail.com wrote: Thanks Ivan, do you mean what I obtain from a request such as curl -XGET 'localhost:9200/_analyze?tokenizer=keywordfilters=lowercase,my_ascii_folding,my_stopwords' -d 'El corregimiento de Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)' is not what will be present in the index after the analysis process? If so, how could I check whether the stop words filter is being (will be) applied to a sample phrase? 2014-08-28 14:03 GMT-05:00 Ivan Brusic i...@brusic.com: Also note that the content returned will still contain the stop words. Only the inverted index will contain the stopword-less content. -- Ivan On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko ita...@code972.com wrote: What would be the usecase for such a process (removing stop words without tokenization)? This may be a good read btw: http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/ -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Aug 28, 2014 at 9:48 PM, German Carrillo carrillo.ger...@gmail.com wrote: Hi all, I'm looking for a way to remove stop words from tokens returned by a keyword tokenizer, i.e., I'd like to obtain the original text without stop words after the analysis process. Sample data looks like: El corregimiento de Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca) After the lowercase token filter: el corregimiento de mulaló, jurisdicción del municipio de yumbo (valle del cauca) After the ascii folding token filter:el corregimiento de mulalo, jurisdiccion del municipio de yumbo (valle del cauca) After removing stop words: corregimiento mulalo, municipio yumbo (valle cauca) The stop words (currently) are: [la, el, de, del, los, las, jurisdiccion] Is the pattern replace token filter the only (or best) way to go for such a task? I'd really like to avoid writing custom regular expressions rather than specifying a stop words list, which I know would work perfectly fine for other tokenizers. Regards, Germán -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
Do you want to build from source? Or do you want to install a fresh binary? At jenkins.elasticsearch.org I can not find any snapshot builds but it may be just me. It would be a nice add-on to provide snapshot builds for users that eagerly await bug fixes or take a ride on the bleeding edge before the next release arrives, without release notes etc. Jörg On Fri, Aug 29, 2014 at 4:29 PM, tony.apo...@iqor.com wrote: Thanks again and sorry to bother you guys but I'm new to Github and don't know what do do from here. Can you point me to the right place where I can take the next step to put this patch on my server? I only know how to untar the tarball I downloaded from the main ES page. Thanks. Tony On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote: Kudos! Tony On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote: All praise should go to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety. Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote: Amazing job. Great work. -- Ivan On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com joerg...@gmail.com wrote: I fixed the issue by setting the safe LZF encoder in LZFCompressor and opened a pull request https://github.com/elasticsearch/elasticsearch/pull/7466 Jörg On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com joerg...@gmail.com wrote: Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com joerg...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common. compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : { index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true, path: full, properties : { location : { type : geo_point } } } } } }} On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote: I have no plugins installed (yet) and only changed es.logger.level to DEBUG in logging.yml. elasticsearch.yml: cluster.name: es-AMS1Cluster node.name: KYLIE1 node.rack: amssc2client02 path.data: /export/home/apontet/elasticsearch/data path.work: /export/home/apontet/elasticsearch/work path.logs: /export/home/apontet/elasticsearch/logs network.host: = sanitized line; file contains actual server IP discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7] = Also sanitized Thanks, Tony On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote: I tested a simple Hello World document on Elasticsearch 1.3.2 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings. No issues. So I would like to know more about the settings in elasticsearch.yml, the mappings, and the installed plugins. Jörg On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com wrote: I have some Solaris 10 Sparc V440/V445 servers available and can try to reproduce over the weekend. Jörg On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir rober...@elasticsearch.com wrote: How big is it? Maybe i can have it anyway? I pulled two ancient ultrasparcs out of my closet to try to debug your issue, but unfortunately they are a pita to work with (dead nvram battery on both, zeroed mac address, etc.) Id still love to get to the bottom of
Re: which class file trigger writing of segments.gen / segments_1
Thanks Jörg, read this link https://lucene.apache.org/core/4_8_1/core/org/apache/lucene/index/SegmentInfos.html , very informative. Found a few spots that call the class SegmentInfos, below are them. https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/gateway/local/LocalIndexShardGateway.java https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/common/lucene/Lucene.java https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/merge/policy/ElasticsearchMergePolicy.java https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java I understand that both segments file are written by Lucene but during index, do you know which class in elasticsearch that eventually lead to trigger the underlying writing of segments file? /Jason On Fri, Aug 29, 2014 at 8:49 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: This is Lucene, when indexing starts. Look at the SegmentsInfo class https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/index/SegmentInfos.html Jörg On Fri, Aug 29, 2014 at 2:38 PM, Jason Wee peich...@gmail.com wrote: Hello people, Anybody know which class/component in elastic search trigger writing of segments.gen and segments_1? I'm currently using elastic search version 1.2.1. It would be great if you can provide link pin point which line in the class does that. Thank you. /Jason -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3Ddauci86-7Y-RaN%2BJW94kqXU%3DwTA3kgxLO5Mj%3DLL0aQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3Ddauci86-7Y-RaN%2BJW94kqXU%3DwTA3kgxLO5Mj%3DLL0aQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itx9BQfQN2f0JLwwUi4kKupohF1Otxh1_s31t40QeZbcPg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Not able to fulltext index Microsoft Office documents - PDF works fine
Hi David, I am currently using elasticsearch-1.3.1. Will the mapper-attchements-2.3.2 be compatible with my version of ES or will have have to update? Thanks, - Kyle -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Not-able-to-fulltext-index-Microsoft-Office-documents-PDF-works-fine-tp4062325p4062665.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1409321264321-4062665.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Not able to fulltext index Microsoft Office documents - PDF works fine
It will work with 1.3.1. You should update to 1.3.2 though because we fixed some issues in this version. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 29 août 2014 à 16:07, feenz kfeeney5...@gmail.com a écrit : Hi David, I am currently using elasticsearch-1.3.1. Will the mapper-attchements-2.3.2 be compatible with my version of ES or will have have to update? Thanks, - Kyle -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Not-able-to-fulltext-index-Microsoft-Office-documents-PDF-works-fine-tp4062325p4062665.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1409321264321-4062665.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FBD8CEB7-43FE-46B5-928D-9E325B6B684F%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Multi-field collapsing
I have a use case which requires collapsing on multiple fields. As a simple example assume I have some movie documents indexed with the fields: Director, Actor, Title Release Date. I want to be able to collapse on Director and Actor, getting the most recent movie (as indicated by Release Date). I think the new top hits aggregation almost gets me mostly what I need. I can create a terms aggregation on Director, with a sub terms aggregation on Actor, and add a top hits aggregation to that (size 1). Would this be the proper approach? By traversing over the aggregations I can get all of the hits that I want - however I can't (have elasticsearch) sort or page them. It's almost like I'd need a hitCollector aggregation which would collect all search hits generated by it's sub aggregations and allow me to specify sort and paging information at that level. Thoughts? Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/318b7474-004f-4244-90e8-d9b93639481f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Does transport client do scatter gather?
I'm not exactly sure what you mean by scatter-gather, but yes, both clients can execute requests on all nodes of the cluster. Jörg On Fri, Aug 29, 2014 at 3:43 PM, John Smith java.dev@gmail.com wrote: Just as the subject asks or only the node client can do scatter gather? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: What the heck is this search?? :)
Hi Boaz, Thanks for the reply. :) It's not a problem per-se. I'm working through performance/memory issues and turned on the slow log file and that one popped up. It's a problem because it's slow, but not causing cluster stability issues! It's interesting that you think it is Kibana though. I removed the Head plugin for 3 days and didn't see that query logged once, so I was pretty sure it was the culprit! Maybe it was just coincidence that whatever in Kibana was doing it didn't happen then either. Just my luck. ;) Thanks again. Chris On Thu, Aug 28, 2014 at 3:48 PM, Boaz Leskes b.les...@gmail.com wrote: Hi Chris, This is actually Kibana. The reason it uses query_string is to allow people some kind of syntax in their query with no query parsing on the client side. Just a decision which I guess was made long ago to keep things simple. Is this a problem for you in any way? Cheers, Boaz On Thursday, August 21, 2014 6:37:02 PM UTC+2, Chris Neal wrote: Done. Will report back. Thank you! On Thu, Aug 21, 2014 at 11:27 AM, Itamar Syn-Hershko ita...@code972.com wrote: I'm going to bet on Head. Disable it and see what happens. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Aug 21, 2014 at 7:22 PM, Chris Neal chris.n...@derbysoft.net wrote: Thanks guys for the thoughts. Plugins didn't even occur to me, but they should have. We've got Marvel, Head, and ElasticHQ installed. Is there some way to tell where the search is coming from? Something like an HTTP access log or something? Thanks again for your time! Chris On Wed, Aug 20, 2014 at 3:57 PM, Itamar Syn-Hershko ita...@code972.com wrote: I thought of Kibana because there's a faceting operation on the _type field. But I doubt neither Marvel nor Kibana would issue such an awful query (notice the fquery bit, too). Any part of your system (plugin or other) which might want to look at the types of documents added to an ES index? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Wed, Aug 20, 2014 at 11:53 PM, Ivan Brusic i...@brusic.com wrote: Very strange query indeed. Wildcard search filtered by a match_all. What?!? It is not Elasticsearch, but perhaps some plugin. Itamar mentioned Kibana, although you did not mention it in your post. Any other plugins? Marvel? -- Ivan On Wed, Aug 20, 2014 at 12:43 PM, Itamar Syn-Hershko ita...@code972.com wrote: There is no such thing as query internal to ES, if you see this in the logs you have a client making it. I would point to a Kibana instance but I'm pretty sure Kibana won't use a query_string query like this. And yes this is quite an expensive query (and facets) to run on a decent sized installation. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Wed, Aug 20, 2014 at 10:14 PM, Chris Neal chris.n...@derbysoft.net wrote: Hi guys, I'm working through some performance concerns in my cluster, and I turned on the slow log feature. I'm seeing this in the index_search_slowlog.log log: [2014-08-20 06:37:52,734][INFO ][index.search.slowlog.query] [elasticsearch-ip-10-0-0-41] [index-20140731][0] took[6s], took_millis[6081], types[], stats[], search_type[QUERY_TH EN_FETCH], total_shards[86], source[{facets:{terms:{ terms:{field:_type,size:100,order:count,exclude :[]},facet_filter:{fquery:{query:{filtered:{query: {bool:{should:[{query_string:{query:*}}]}}, filter:{bool:{must:[{match_all:{}}],size:0}], extra_source[], Is that a user generated search, or something internal to ES maybe? I can't even tell what it's trying to do. It seems to hit every one of my indexes though, as the same search query is logged 63 times in a one minute period. Any ideas what this is? Is it something to be concerned about? Thanks for the help! Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ CAND3Dpj7BzbaNva9B7JNFOeeaC9SrYWCEnvzTJgx2-AQeT478w%40mail. gmail.com https://groups.google.com/d/msgid/elasticsearch/CAND3Dpj7BzbaNva9B7JNFOeeaC9SrYWCEnvzTJgx2-AQeT478w%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
The snapshot repo is still active, but it is a bit behind and does not include this patch: https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch/ -- Ivan On Fri, Aug 29, 2014 at 8:21 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: Do you want to build from source? Or do you want to install a fresh binary? At jenkins.elasticsearch.org I can not find any snapshot builds but it may be just me. It would be a nice add-on to provide snapshot builds for users that eagerly await bug fixes or take a ride on the bleeding edge before the next release arrives, without release notes etc. Jörg On Fri, Aug 29, 2014 at 4:29 PM, tony.apo...@iqor.com wrote: Thanks again and sorry to bother you guys but I'm new to Github and don't know what do do from here. Can you point me to the right place where I can take the next step to put this patch on my server? I only know how to untar the tarball I downloaded from the main ES page. Thanks. Tony On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote: Kudos! Tony On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote: All praise should go to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety. Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote: Amazing job. Great work. -- Ivan On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com joerg...@gmail.com wrote: I fixed the issue by setting the safe LZF encoder in LZFCompressor and opened a pull request https://github.com/elasticsearch/elasticsearch/pull/7466 Jörg On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com joerg...@gmail.com wrote: Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com joerg...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common. compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : { index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true, path: full, properties : { location : { type : geo_point } } } } } }} On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote: I have no plugins installed (yet) and only changed es.logger.level to DEBUG in logging.yml. elasticsearch.yml: cluster.name: es-AMS1Cluster node.name: KYLIE1 node.rack: amssc2client02 path.data: /export/home/apontet/elasticsearch/data path.work: /export/home/apontet/elasticsearch/work path.logs: /export/home/apontet/elasticsearch/logs network.host: = sanitized line; file contains actual server IP discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7] = Also sanitized Thanks, Tony On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote: I tested a simple Hello World document on Elasticsearch 1.3.2 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings. No issues. So I would like to know more about the settings in elasticsearch.yml, the mappings, and the installed plugins. Jörg On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com wrote: I have some Solaris 10 Sparc V440/V445 servers available and can try to reproduce over the weekend. Jörg On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir
Re: EL setup for fulltext search
That output does not look like the something generated from the standard analyzer since it contains uppercase letters and various non-word characters such as '='. Your two analysis requests will differ since the second one contains the default word_delimiter filter instead of your custom my_word_delimiter. What you are trying to achieve is somewhat difficult, but you can get there if you keep on tweaking. :) Try using a pattern tokenizer instead of the whitespace tokenizer if you want more control over word boundaries. -- Ivan On Fri, Aug 29, 2014 at 1:48 AM, Marc mn.off...@googlemail.com wrote: Hi Ivan, thanks again. I have tried so and found a reasonable combination. Nevertheless, when I now try to use the analyze api with an index that has the said analyzer defined via template it doesn't seem to apply: This is the complete template: { template: bogstash-*, settings: { index.number_of_replicas: 0, analysis: { analyzer: { msg_excp_analyzer: { type: custom, tokenizer: whitespace, filters: [word_delimiter, lowercase, asciifolding, shingle, standard] } }, filters: { my_word_delimiter: { type: word_delimiter, preserve_original: true }, my_asciifolding: { type: asciifolding, preserve_original: true } } } }, mappings: { _default_: { properties: { @excp: { type: string, index: analyzed, analyzer: msg_excp_analyzer }, @msg: { type: string, index: analyzed, analyzer: msg_excp_analyzer } } } } } I create the index bogstash-1. Now I test the following: curl -XGET 'localhost:9200/bogstash-1/_analyze?analyzer=msg_excp_analyzerpretty=1' -d 'Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22 (updated attributes=gps_lng: 183731222/ gps_lat: 289309222/ )' and it returns: { tokens : [ { token : Service=MyMDB.onMessage, start_offset : 0, end_offset : 23, type : word, position : 1 }, { token : appId=cs, start_offset : 24, end_offset : 32, type : word, position : 2 }, { token : Times=Me:22/Total:22, start_offset : 33, end_offset : 53, type : word, position : 3 }, { token : (updated, start_offset : 54, end_offset : 62, type : word, position : 4 }, { token : attributes=gps_lng:, start_offset : 63, end_offset : 82, type : word, position : 5 }, { token : 183731222/, start_offset : 83, end_offset : 93, type : word, position : 6 }, { token : gps_lat:, start_offset : 94, end_offset : 102, type : word, position : 7 }, { token : 289309222/, start_offset : 103, end_offset : 113, type : word, position : 8 }, { token : ), start_offset : 114, end_offset : 115, type : word, position : 9 } ] } Which is the output of a standard analyzer. Giving the tokenizer and filters in the analyze API directly works fine: curl -XGET 'localhost:9200/_analyze?tokenizer=whitespacefilters=lowercase,word_delimiter,shingle,asciifolding,standardpretty=1' -d 'Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22 (updated attributes=gps_lng: 183731222/ gps_lat: 289309222/ )' This results in: { tokens : [ { token : service, start_offset : 0, end_offset : 7, type : word, position : 1 }, { token : service mymdb, start_offset : 0, end_offset : 13, type : shingle, position : 1 }, { token : mymdb, start_offset : 8, end_offset : 13, type : word, position : 2 }, { token : mymdb onmessage, start_offset : 8, end_offset : 23, type : shingle, position : 2 }, { token : onmessage, start_offset : 14, end_offset : 23, type : word, position : 3 }, { token : onmessage appid, start_offset : 14, end_offset : 29, type : shingle, position : 3 }, { token : appid, start_offset : 24, end_offset : 29, type : word, position : 4 }, { token : appid cs, start_offset : 24, end_offset : 32, type : shingle, position : 4 }, { token : cs, start_offset : 30, end_offset : 32, type : word, position : 5 }, { token : cs times, start_offset : 30, end_offset : 38, type : shingle,
How big can/should you scale Elasticsearch
We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster. The key is an integer and the item data is fairly small. We seem to run into issues around loading. Seems to slow down as the index gets bigger. We are doing this on EC2 i2.xlarge nodes. How many documents/TB do you think we can load per node max? So if we can do 2 Billion each then we need 5 nodes. We are trying to size it. Any advice is welcome. Even if it is that this is not a good thing to do :) thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3faa4de9-0a27-49dc-8f68-ceebd5569da9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
The easiest for me is to install fresh binaries but I'm not shy about learning about Maven while I build it from source. Thanks Tony On Friday, August 29, 2014 11:21:34 AM UTC-4, Jörg Prante wrote: Do you want to build from source? Or do you want to install a fresh binary? At jenkins.elasticsearch.org I can not find any snapshot builds but it may be just me. It would be a nice add-on to provide snapshot builds for users that eagerly await bug fixes or take a ride on the bleeding edge before the next release arrives, without release notes etc. Jörg On Fri, Aug 29, 2014 at 4:29 PM, tony@iqor.com javascript: wrote: Thanks again and sorry to bother you guys but I'm new to Github and don't know what do do from here. Can you point me to the right place where I can take the next step to put this patch on my server? I only know how to untar the tarball I downloaded from the main ES page. Thanks. Tony On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote: Kudos! Tony On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote: All praise should go to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety. Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote: Amazing job. Great work. -- Ivan On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com joerg...@gmail.com wrote: I fixed the issue by setting the safe LZF encoder in LZFCompressor and opened a pull request https://github.com/elasticsearch/elasticsearch/pull/7466 Jörg On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com joerg...@gmail.com wrote: Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com joerg...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common. compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : { index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true, path: full, properties : { location : { type : geo_point } } } } } }} On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote: I have no plugins installed (yet) and only changed es.logger.level to DEBUG in logging.yml. elasticsearch.yml: cluster.name: es-AMS1Cluster node.name: KYLIE1 node.rack: amssc2client02 path.data: /export/home/apontet/elasticsearch/data path.work: /export/home/apontet/elasticsearch/work path.logs: /export/home/apontet/elasticsearch/logs network.host: = sanitized line; file contains actual server IP discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7] = Also sanitized Thanks, Tony On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote: I tested a simple Hello World document on Elasticsearch 1.3.2 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings. No issues. So I would like to know more about the settings in elasticsearch.yml, the mappings, and the installed plugins. Jörg On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com wrote: I have some Solaris 10 Sparc V440/V445 servers available and can try to reproduce over the weekend. Jörg On Sat, Aug 23, 2014 at
Re: Explicitly Copying Replica Shards That Fail to Start
I used to apply that trick all the time with older versions of Elasticsearch! Thankfully it has not occurred to me in years. -- Ivan On Thu, Aug 28, 2014 at 3:53 PM, Mark Walkom ma...@campaignmonitor.com wrote: Yep, the easiest way is to drop the replica and then add it back and see how you go. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 29 August 2014 08:40, David Kleiner david.klei...@gmail.com wrote: Greetings, I am still having a problem with recovery of 5 replica shards in 2 indices of mine, 3-way cluster. The replica shards fail to initialize and are jumping around two secondary nodes. The primary shards are fine. What is my path to recovery? Is copying master shard to secondary nodes a correct way? I tried issuing routing commands to cancel recovery/allocation, it helped with some secondary shards but not with the 5 in question. I also tried dumping index with failing secondary shards but two nodes crashed (well, lost connection to cluster) so dump failed. Would setting replica # to 0, copying masters to 2 nodes and setting replica # to 1 a viable alternative? Thank you, David -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZpLMWPg95joA023WT3hS7AsS1x4%3DN4E5UUWuyt_LAWtg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624ZpLMWPg95joA023WT3hS7AsS1x4%3DN4E5UUWuyt_LAWtg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCTZzDLj-roqZYV40sf8QrF7K_OB1oOAAVNv8N7m9zp-A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Does transport client do scatter gather?
According to this... http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html Non data nodes (I assume Node client is equivalent of a non data node) is capable of scatter/gather searching. Was wondering if transport can do this also? 2- Does transport support routing if you specify routing field? Or does it always round robin regardless? On Aug 29, 2014 12:09 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: I'm not exactly sure what you mean by scatter-gather, but yes, both clients can execute requests on all nodes of the cluster. Jörg On Fri, Aug 29, 2014 at 3:43 PM, John Smith java.dev@gmail.com wrote: Just as the subject asks or only the node client can do scatter gather? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/70zTmEuyWHE/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMiEuFSuCrwaF6qoVf3-rsA_NjQKrJjFue62kjVvoiUH8A2rJA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How big can/should you scale Elasticsearch
When you look to the guys @ found (https://www.found.no/pricing/) then the data on one ES server is 8 times memory, if it should run smooth, but do not know how valuable that is. If you have a lot of ES nodes, then consider one master node without data, it's a best practice I have read somewhere. 16GB Memory equals 128GB data. On Friday, August 29, 2014 7:27:28 PM UTC+2, Rob Blackin wrote: We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster. The key is an integer and the item data is fairly small. We seem to run into issues around loading. Seems to slow down as the index gets bigger. We are doing this on EC2 i2.xlarge nodes. How many documents/TB do you think we can load per node max? So if we can do 2 Billion each then we need 5 nodes. We are trying to size it. Any advice is welcome. Even if it is that this is not a good thing to do :) thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3e4601d-8564-47f6-b3b3-0fdb91fac96e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
Quick guide: - install Java 7 (or Java 8), Apache Maven, and git, also ensure internet connection to the Maven central repo - clone 1.3 branch only (you could also clone the whole repo and switch to the branch): git clone https://github.com/elasticsearch/elasticsearch.git --branch 1.3 --single-branch es-1.3 - enter folder es-1.3 - start build: mvn -DskipTests clean install - wait a few minutes while Maven loads all dependent artifacts and compiles ~3000 source files The result will be a complete build of all binaries. In the 'target' folder, after the Build complete message of Maven, you will see a file elasticsearch-VERSION.jar VERSION is something like 1.3.3-SNAPSHOT. You can copy this file into your existing Elasticsearch 1.3.x installation lib folder. Do not forget to adjust bin/elasticsearch.in.sh to point to the new elasticsearch-VERSION.jar file in the classpath configuration (at the top lines). This must be the first jar on the classpath so it can patch Lucene jars. If you have already data in the existing Elasticsearch I recommend to backup everything before starting the new snapshot build - no guarantees, use at your own risk. Jörg On Fri, Aug 29, 2014 at 7:36 PM, tony.apo...@iqor.com wrote: The easiest for me is to install fresh binaries but I'm not shy about learning about Maven while I build it from source. Thanks Tony On Friday, August 29, 2014 11:21:34 AM UTC-4, Jörg Prante wrote: Do you want to build from source? Or do you want to install a fresh binary? At jenkins.elasticsearch.org I can not find any snapshot builds but it may be just me. It would be a nice add-on to provide snapshot builds for users that eagerly await bug fixes or take a ride on the bleeding edge before the next release arrives, without release notes etc. Jörg On Fri, Aug 29, 2014 at 4:29 PM, tony@iqor.com wrote: Thanks again and sorry to bother you guys but I'm new to Github and don't know what do do from here. Can you point me to the right place where I can take the next step to put this patch on my server? I only know how to untar the tarball I downloaded from the main ES page. Thanks. Tony On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote: Kudos! Tony On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote: All praise should go to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety. Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote: Amazing job. Great work. -- Ivan On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com joerg...@gmail.com wrote: I fixed the issue by setting the safe LZF encoder in LZFCompressor and opened a pull request https://github.com/elasticsearch/elasticsearch/pull/7466 Jörg On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com joerg...@gmail.com wrote: Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com joerg...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common.co mpress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ning /compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : { index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true, path: full, properties : { location : { type : geo_point } } } }} }} On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote: I have no plugins
Re: How big can/should you scale Elasticsearch
On Fri, Aug 29, 2014 at 1:27 PM, Rob Blackin robblac...@gmail.com wrote: We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster. The key is an integer and the item data is fairly small. We're running around 5.5TB right now without a problem. The biggest annoyance is that rolling restarts take time proportional to how much data you have. We have much larger documents then you have so we only store 181 million or so. Our documents are interactively maintained - a consistent portion of them are updated daily with some creates and a few rare deletes. You might want to think about how you do sharding - look into routing to see if you can get away with oversubscribing on shards. You might also look into using multiple indexes as well. Shay gave a talk on how you could subdivide one large set of data into multiple indexes to help things. One 5TB index would be difficult to maintain. As are any shards that are more then, say, 20GB. Just shuffling those shards from system to system for rebalancing gets expensive. Merges on those shards have a higher upper bound on disk io and cache thrash. We seem to run into issues around loading. Seems to slow down as the index gets bigger. Check on your merge rate. This is old but it'll give you some idea of what is going on: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html You can tune this a bit - especially if your data comes in spurts. We are doing this on EC2 i2.xlarge nodes. How many documents/TB do you think we can load per node max? So if we can do 2 Billion each then we need 5 nodes. We are trying to size it. I can't talk to Amazon because we use physical machines. We use 18 machines with two reasonably nice Intel ssds per machine, 96GB of ram, and pretty sizeable CPUs and it isn't really enough to handle the query load we want to throw at it. I imagine the shape of your load is going to be of a different though. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1-59_4WQyKGFOsWBDmZd8iYu9agQPszwh80rB8g8vQ4Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Does transport client do scatter gather?
A node client is not just a non-data node although very close. The ES page describes a proxy node scenario. Example: you have many HTTP clients and they search for large result sets. This is often a challenge because of the high resource contention. One or more data-less proxy nodes can help in gathering these result sets, letting the data nodes alone, which just do the scatter part of the search. This is similar to how a TransportClient works for a JVM-only client. TransportClient is also a proxy node that gathers result sets. But with some subtle difference, you can not connect HTTP clients to a TransportClient, and because the TransportClient is not a cluster member, it uses the configured connected nodes as gather nodes within the cluster. Because there are two gather nodes, this is called an extra hop in comparison to a Java NodeClient. But, if you add the HTTP client request to the request scenario mentioned before, there is no extra hop, only an extra JVM. So the best place for TransportClient is on a remote host. In Java, NodeClient and TransportClient share the full functionality of ES, routing requests, round-robin for load balancing etc. For cluster-specific server-only services like listening to cluster state, or snapshot/restore, a TransportClient is not feasible, it can't do it or must ask a node in the cluster for passing the information. Jörg On Fri, Aug 29, 2014 at 8:54 PM, John Smith java.dev@gmail.com wrote: According to this... http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html Non data nodes (I assume Node client is equivalent of a non data node) is capable of scatter/gather searching. Was wondering if transport can do this also? 2- Does transport support routing if you specify routing field? Or does it always round robin regardless? On Aug 29, 2014 12:09 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: I'm not exactly sure what you mean by scatter-gather, but yes, both clients can execute requests on all nodes of the cluster. Jörg On Fri, Aug 29, 2014 at 3:43 PM, John Smith java.dev@gmail.com wrote: Just as the subject asks or only the node client can do scatter gather? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/70zTmEuyWHE/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMiEuFSuCrwaF6qoVf3-rsA_NjQKrJjFue62kjVvoiUH8A2rJA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAMiEuFSuCrwaF6qoVf3-rsA_NjQKrJjFue62kjVvoiUH8A2rJA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3D-7Xwe6N%3DiOK3YiT-a9EmwOAbu4KqGM1xT1Yu_FHsbQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access
Thank you very much. Tony On Friday, August 29, 2014 3:27:33 PM UTC-4, Jörg Prante wrote: Quick guide: - install Java 7 (or Java 8), Apache Maven, and git, also ensure internet connection to the Maven central repo - clone 1.3 branch only (you could also clone the whole repo and switch to the branch): git clone https://github.com/elasticsearch/elasticsearch.git --branch 1.3 --single-branch es-1.3 - enter folder es-1.3 - start build: mvn -DskipTests clean install - wait a few minutes while Maven loads all dependent artifacts and compiles ~3000 source files The result will be a complete build of all binaries. In the 'target' folder, after the Build complete message of Maven, you will see a file elasticsearch-VERSION.jar VERSION is something like 1.3.3-SNAPSHOT. You can copy this file into your existing Elasticsearch 1.3.x installation lib folder. Do not forget to adjust bin/elasticsearch.in.sh to point to the new elasticsearch-VERSION.jar file in the classpath configuration (at the top lines). This must be the first jar on the classpath so it can patch Lucene jars. If you have already data in the existing Elasticsearch I recommend to backup everything before starting the new snapshot build - no guarantees, use at your own risk. Jörg On Fri, Aug 29, 2014 at 7:36 PM, tony@iqor.com javascript: wrote: The easiest for me is to install fresh binaries but I'm not shy about learning about Maven while I build it from source. Thanks Tony On Friday, August 29, 2014 11:21:34 AM UTC-4, Jörg Prante wrote: Do you want to build from source? Or do you want to install a fresh binary? At jenkins.elasticsearch.org I can not find any snapshot builds but it may be just me. It would be a nice add-on to provide snapshot builds for users that eagerly await bug fixes or take a ride on the bleeding edge before the next release arrives, without release notes etc. Jörg On Fri, Aug 29, 2014 at 4:29 PM, tony@iqor.com wrote: Thanks again and sorry to bother you guys but I'm new to Github and don't know what do do from here. Can you point me to the right place where I can take the next step to put this patch on my server? I only know how to untar the tarball I downloaded from the main ES page. Thanks. Tony On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote: Kudos! Tony On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote: All praise should go to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety. Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote: Amazing job. Great work. -- Ivan On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com joerg...@gmail.com wrote: I fixed the issue by setting the safe LZF encoder in LZFCompressor and opened a pull request https://github.com/elasticsearch/elasticsearch/pull/7466 Jörg On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com joerg...@gmail.com wrote: Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com joerg...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common. compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b There has been a fix in LZF lately https://github.com/ning /compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 for version 1.0.3 which has been released recently. I will build a snapshot ES version with LZF 1.0.3 and see if this works... Jörg On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote: I captured a WireShark trace of the interaction between ES and Logstash 1.4.1. The error occurs even before my data is sent. Can you try to reproduce it on your testbed with this message I captured? curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y Contests of file 'y: { template : logstash-*, settings : { index.refresh_interval : 5s }, mappings : {_default_ : { _all : {enabled : true}, dynamic_templates : [ { string_fields : { match : *, match_mapping_type : string, mapping : { type : string, index : analyzed, omit_norms : true, fields : { raw : {type: string, index : not_analyzed, ignore_above : 256} } } } } ], properties : { @version: { type: string, index: not_analyzed }, geoip : { type : object, dynamic: true,
Re: Replica assignement on the same host
It's Friday. Can't read. Nevermind. :) On Fri, Aug 29, 2014 at 5:06 PM, Mark Walkom ma...@campaignmonitor.com wrote: He's running multiple ES instances/nodes per physical, ie a VM or container or just a second process, so I don't think it's primary and secondary on the same ES instance. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 30 August 2014 05:16, Ivan Brusic i...@brusic.com wrote: The replica of a shard should never be on the same node as the primary. Where did you notice this anomaly? What version are you using? -- Ivan On Fri, Aug 29, 2014 at 3:52 AM, Mark Walkom ma...@campaignmonitor.com wrote: That's the best method as per http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 29 August 2014 20:45, 'Nicolas Fraison' via elasticsearch elasticsearch@googlegroups.com wrote: Hi, I have an ES cluster with 12 data nodes spread on 6 servers (so 2 nodes per server) and I saw that replicas of a shard can be allocated on the same server(on each nodes hosted by a server) To avoid this I haveset those parameters to the cluster: node.host: server_name cluster.routing.allocation.awareness.attributes: zone, host But I'm wondering if there are not a specific parameter for this instead of using clustering awareness allocation? Nicolas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YaxHLT%3DsptzqcSQw3i9u9oozO_2DstFJ6vCs-VC_bzOw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624YaxHLT%3DsptzqcSQw3i9u9oozO_2DstFJ6vCs-VC_bzOw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCFgA6FprUZ%2BZoYsB47N4f28pXSP4%2BGfdkRn_3L%3D_tXow%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCFgA6FprUZ%2BZoYsB47N4f28pXSP4%2BGfdkRn_3L%3D_tXow%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bMkjzG%3Db%3DaQdq0pd6khSsVi3BDgVa2AcChimiuxtneKA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624bMkjzG%3Db%3DaQdq0pd6khSsVi3BDgVa2AcChimiuxtneKA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA_uyaQGXdKQt8hZr1X-qq_%2Bm3Rh%2BCpG4Fg_wRqYhdN0Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.