Route documents at index time to a particular shard

2014-08-21 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi, Can you please tell me if there is a plugin that I can use during indexing which will let me direct a document to a particular shard? So that I can set the shardId and send the document as the request to that shard? Thanks, Sandeep -- You received this message because you are subscribed

When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi, What can be the possible causes when ElasticSearch will automatically reallocate a shard from node in the cluster to another node? One can be obviously when you add a new node. What are the automatic triggers, like continuously indexing new data or something? What is the policy for this?

Re: annual monthly aggregation

2014-08-21 Thread Emmanuel Mathot
I finally managed it with this aggregations: { time: { terms: { script: new java.text.SimpleDateFormat('MM').format(new Date(_doc.time.value)) } } } } Thank you On Wednesday, August 20, 2014 3:53:35 PM UTC+2, Ramy wrote: Have you tried something like that: ...

Re: Replica Shard stuck in INITIALIZING state

2014-08-21 Thread Peeyush Chandel
Only fix i got is: I stopped deleteByQuery from my code. Then i dropped all replicas recreated them. Once all got stabilized. I restarted deleteByQuery in my code. That's how it worked for me all back to normal. On Thursday, 21 August 2014 09:54:39 UTC+5:30, Peeyush Chandel wrote: Hi,

Re: When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread joergpra...@gmail.com
There is a formula ES uses by default to find if nodes get unbalanced regarding the shards. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html#_balanced_shards Jörg On Thu, Aug 21, 2014 at 8:26 AM, 'Sandeep Ramesh Khanzode' via elasticsearch

Re: When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi Jorg, Thanks. Is there a size based allocation. What seems to be is that we have allocation based on number of primaries, per index, per node, etc. Is there a size factor that comes into play like, say if the routing is not even function, and shards on one node are more heavily loaded than

Re: When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread joergpra...@gmail.com
There is disk-based allocation. It does not take shard volume into account. It is not always a good idea to use total shard volume per node as a measurement across indices, consider heavy bulk indexing with steep volume changes, but the remaining disk space.

How to use Customized Query of Lucene in Elasticsearch

2014-08-21 Thread Peiyong Lin
Hi all, I have written a customized Query and Scorer for Lucene, but now I want to use this Query in Elasticsearch. I know that I can use it as a plugin, but what should I do to register it as a plugin? I have searched for documentations or examples but I found nothing mention how to use a

Re: Replica Shard inconsistencies disabling compression don't appear to help

2014-08-21 Thread joergpra...@gmail.com
Do you observe the replica shard inconsistency only by checksum after network transport? In other words, are you sure the inconsistency you observe is caused by a compression issue in LZF? Jörg On Thu, Aug 21, 2014 at 5:52 AM, Paul Smith tallpsm...@gmail.com wrote: Hi all, The recent ES

Re: When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
I see what you mean. Is it possible that once the shards have been allocated by this formula or the policies, they will ever change again. I mean, if we have 3 indices on a two node cluster with 10 shards each. Now I add another index with 10 shards, will the EXISTING indices' shards be

My custom analyzer is registered but not used during indexing

2014-08-21 Thread Frederic Esnault
Hi everyone, I'm facing a curious problem. I defined an analyzer in my settings, this way : { *index*:{ cluster.name:test-cluster, client.transport.sniff:true, *analysis*:{ *filter*:{ french_elision:{ type:elision,

Re: Query pre-processing before execution?

2014-08-21 Thread joergpra...@gmail.com
I would rather use the analyzer/token filter machinery of Lucene for search/index extensions, plugging this into ES is a breeze. If you want field specific mangling, I would use the field mapper to create a new field type. There, you have read access to the whole (immutable) document source and

Custom analyzer registered but not used

2014-08-21 Thread Frederic Esnault
Hi everyone, I'm facing a curious problem. I defined an analyzer in my settings, this way : { *index*:{ cluster.name:test-cluster, client.transport.sniff:true, *analysis*:{ *filter*:{ *french_elision*:{ type:elision,

Configured custom analyzer registered but not used while indexing

2014-08-21 Thread Frederic Esnault
Hi everyone, I'm facing a curious problem. I configured a custom analyzer this way in my settings : { *index*:{ *cluster.name*:test-cluster, *client.transport.sniff*:true, *analysis*:{ *filter*:{ *french_elision*:{ type:elision,

Multiple Indices vs Multiple Shards

2014-08-21 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi, What is the difference in performance or additional load on ElasticSearch if I define an index with 50 shards or I define 50 indices with one shard. I mean, technically, there are blogs that suggest that these are equivalent? Of course without the shard rebalancing, replica, failover, etc.

ELK on Hadoop

2014-08-21 Thread i9um0
Hi all, I just wanna ask what are benefits running either ELK cluster only or ELK Cluster on hadoop ? regards, -mugi- -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an

Re: Multiple Indices vs Multiple Shards

2014-08-21 Thread Mark Walkom
Very little as you have found. But you might find aggregating will take a fair bit of resources. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 21 August 2014 18:34, 'Sandeep Ramesh Khanzode' via elasticsearch

Where's my data gone?

2014-08-21 Thread sale_chris
We have a small Elastic Search cluster consisting of 3 nodes. Looking back at some historic data that we are holding it would appear that some of our data is missing. This histogram below shows 2 50 minute periods where the number of events is noticeably lower. This wasn't the case when I last

Re: When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread joergpra...@gmail.com
By default, also existing shards will be reallocated. You can also move shards around as you like but I do not recommend it if you do it just for fun. ES default setting for shard allocation is very good. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-reroute.html

Re: Virtual memory usage of elasticsearch

2014-08-21 Thread Gokul nath
Thanks Jorg. I will look into that. On Wednesday, 20 August 2014 13:20:38 UTC+5:30, Jörg Prante wrote: If you get OOM, you should take a look at RSS (resident set size) of the process. The VIRT (virtual memory) can span GB or TB, it does not matter at all. Jörg On Wed, Aug 20, 2014 at

Re: How to do sequence matching

2014-08-21 Thread vineeth mohan
Hello Smitha , You can try the wild card query - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query Thanks Vineeth On Thu, Aug 21, 2014 at 8:51 AM, Smitha Gowda smithago...@gmail.com wrote: Hello Elastic

How does the match_phrase work for a field with different search_analyzer/index_analyzer ?

2014-08-21 Thread Ivan Ji
Hi, all The problem is I have an document field whose search analyzer is fielda_search and index analyzer is fielda_index. I cannot use the exact term to find the document. here is the gist to reproduce the situation: https://gist.github.com/hxuanji/b94d9c3514d7b08005d2. My example document

Dynamically adding new fields from a root mapper

2014-08-21 Thread Jakub Kotowski
Hi, I am creating a plugin that analyzes document being indexed and based on the analysis it adds new fields to it. It is done from a root mapper and it works similarly as the attachment mapper. I need to use the root mapper because I need the whole document for analysis, not individual

Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
I started a rolling restart yesterday but has add to stop because the disks were filling up oddly. It looks like when the bode comes up it no longer deletes shards it can't use. Elasticsearch reports that the disk is nearly full but that it isn't using most of the space. When I look myself the

Re: Files not deleted on upgrade

2014-08-21 Thread Lee Hinman
On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote: I started a rolling restart yesterday but has add to stop because the disks were filling up oddly. It looks like when the bode comes up it no longer deletes shards it can't use. Elasticsearch reports that the disk is

Re: Example needed for Perl Search::Elasticsearch

2014-08-21 Thread Andrew Hamilton
Clinton, I get that, but for some reason it's not that easy to map them for a novice to the API. I'm used to using kibana and have recently just started messing with the Perl API to produce some automated reports. I find the API to be very robust and has lots of features, but the lack of

Trying to setup stunnel for es.

2014-08-21 Thread John Smith
I'm running: Ubuntu 14.04 stunnel4 Elasticsearch 1.3.1 In Elasticsearch.yml I bind ES to localhost only. network.bind_host: 127.0.0.1 In my stunnel config... client = no [elasticsearch] accept = 9600 connect = 127.0.0.1:9300 cert = /etc/stunnel/stunnel.pem Then I run elasticsearch. I have not

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
Hi Lee! Thanks for responding. Ok, here goes: Version: 1.2.1-1.3.2 curl 'localhost:9200/_cat/health?v: epoch timestamp cluster status node.total node.data shards pri relo init unassign 1408630877 14:21:17 production-search-eqiad green 1717 6050 20170

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
This gist shows the error in action: https://gist.github.com/nik9000/3acdb38052dba3fbc5a0 Total - free on disk is 479163707392 But used is 238902736642 Meaning about 50% of used space isn't accounted for. But everything on that partition is in elasticsearch's directory:

Re: Trying to setup stunnel for es.

2014-08-21 Thread John Smith
I set network.host: 127.0.0.1 And now both bind and publish host are bound to 127.0.0.1 and there is no exceptions. Is that right? Now theoretically I should be able to stunnel my node client from another boxing through the 9500 port? On Thursday, 21 August 2014 10:22:16 UTC-4, John Smith

Re: Elasticsearch 1.1.0 Java API Slower Than Curl for Certain Queries

2014-08-21 Thread Elliott Bradshaw
Hi guys, Thanks for getting back to me. Here's the JSON for the query: { bool : { must : [ { indices : { indices : index1, query : { filtered : { query : { function_score : {

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
Whatson shows this very well: https://wikitech.wikimedia.org/wiki/File:Whatson_out_of_disk.png Other points of interest: 1. We're using auto_expand_replicas. 2. The logs are totally clean. Nik On Thu, Aug 21, 2014 at 10:35 AM, Nikolas Everett nik9...@gmail.com wrote: This gist shows the

Re: Typical indexing performance

2014-08-21 Thread Andrew Gin
https://lh3.googleusercontent.com/-BWIsbc_s_OY/U_YOf2AN2dI/ATc/9W7XoGozgpA/s1600/Capture.PNG kibana-marvel output -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send

Re: Typical indexing performance

2014-08-21 Thread Andrew Gin
https://lh3.googleusercontent.com/-_aGP62VEves/U_YOsvpOZBI/ATk/syO0Yi9uG4w/s1600/htop.png htop output -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: Typical indexing performance

2014-08-21 Thread Andrew Gin
Original post with garbled image removed: Hi I am getting to grips with elastic search, but I do not know if I should be worried about the insert performance (see kibana output) As you can see, the insert indexing rate is about 1400, but I don't know if the load and cpu usage is cause for

query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread ben
I have attached a short bash script to recreate the situation. I have a fairly simple custom analyzer that I want to break on camel case so lowercase is last. Using the _analyze endpoint I can see the token I am searching for is generated by the analyzer, however searching for it with

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread ben
Also meant to include this in the script. echo query_string query using singe quote which does not match lucene query documentation curl -XPOST $url/$defaultIndex/example/_search?pretty=true -d ' { query: { query_string: { query: name:''exampleof bug'' } } } ' On Thursday,

Re: Elasticsearch 1.1.0 Java API Slower Than Curl for Certain Queries

2014-08-21 Thread Elliott Bradshaw
Hey I think I got it! So in my curl request, I was specifying the two indexes in the rest endpoint, but I did not specify them in my java search request. My only other index is a suggest index, and I would think that it would be ignored when I ran the query, but I guess not. Thanks a lot for

Re: What the heck is this search?? :)

2014-08-21 Thread Chris Neal
Thanks guys for the thoughts. Plugins didn't even occur to me, but they should have. We've got Marvel, Head, and ElasticHQ installed. Is there some way to tell where the search is coming from? Something like an HTTP access log or something? Thanks again for your time! Chris On Wed, Aug 20,

Re: What the heck is this search?? :)

2014-08-21 Thread Itamar Syn-Hershko
I'm going to bet on Head. Disable it and see what happens. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Aug 21, 2014 at 7:22 PM, Chris Neal

Re: Java API or REST API for client development ?

2014-08-21 Thread Andrew Mehler
I believe this is true again. Aggregations API recently changed, and code no longer compiles. The REST apis definitely change at a slower rate. On Wednesday, March 26, 2014 8:12:56 AM UTC-4, Graham Tackley wrote: Not true anymore: the java client has been compatible between minor versions

Re: What the heck is this search?? :)

2014-08-21 Thread Chris Neal
Done. Will report back. Thank you! On Thu, Aug 21, 2014 at 11:27 AM, Itamar Syn-Hershko ita...@code972.com wrote: I'm going to bet on Head. Disable it and see what happens. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
I suspect the issue is the way the query parser works. The query phrase exampleof bug will be parsed into a query for the tokens exampleof and bug that are adjacent to each other. The issue is that you do not have two such tokens, instead you have a token with the value exampleof bug, which is a

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread ben
But the query is this... name:exampleof bug This should find an exact match in the field name. That exact match token exists. The syntax for lucene under Fields section shows a double quote is the correct character for this. http://lucene.apache.org/core/2_9_4/queryparsersyntax.html The term

Call when shard reallocation occurs

2014-08-21 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi, Is it possible to have my custom callback function defined and invoked by ES whenever it moves a shard from one node to another? Thanks, Sandeep -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop

Accuracy of aggregation when having queries

2014-08-21 Thread Roxana Balaci
I am reading this post about not having maximum accuracy on aggregation results on terms, when adding the size parameter: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate Does this lack of

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
The query string query is a phrase query \exampleof bug\ The term query is looking for a single token exampleof bug The query parser will not use your tokenizer to parse the phrase. It will tokenize based on whitespace and then apply the filters to each term. Your index does not contain the token

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
Here is the Lucene issue: https://issues.apache.org/jira/browse/LUCENE-2605 -- Ivan On Thu, Aug 21, 2014 at 10:09 AM, Ivan Brusic i...@brusic.com wrote: The query string query is a phrase query \exampleof bug\ The term query is looking for a single token exampleof bug The query parser

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread ben
Well crap. By creating tokens that match it eliminates the exact match I'm trying correct? If I indexed two documents with each of the strings below... (assuming the tokens are generated as you stated above) exampleof bug exampleof sample bug Then ran a query: name:exampleof bug Would

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
In general, if you are using the keyword tokenizer or non analyzed fields, then query string queries should probably not be used. Phrase queries and the keyword tokenizer also do not mix well. Your OR queries succeed because bug is a token in your index. -- Ivan On Thu, Aug 21, 2014 at 10:26

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread ben
In the ES documentation is talks about escape characters and space is one of them. Seems like if you escaped the query with a \ it would ignore that during the parsing. Thanks for your help. On Thursday, August 21, 2014 10:42:32 AM UTC-7, Ivan Brusic wrote: In general, if you are using the

geo_shape field not being returned with explicit 'fields' query

2014-08-21 Thread Daniel Sarfati
Hi all, I'm trying to query for a particular subset of fields, one of which is a geo_shape. All of the other fields are returned as expected, *aside from the geo shape*. Here is a trimmed-down version of my mapping: { house: { properties: { id: {type: string},

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-21 Thread tony . aponte
Hello, I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to scale out of small x86 machine. I get a similar exception running ES with JAVA_OPTS=-d64. When Logstash 1.4.1 sends the first message I get the error below on the ES process: # # A fatal error has been detected by the

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
One more thing! The match query does not go through the query parser phase. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_comparison_to_query_string_field curl -XPOST http://localhost:9200/example/example/_search?pretty=true; -d ' { query: {

Can't find unit tests for reserved characters

2014-08-21 Thread ben
In the ES documentation there is a list of reserved characters. I'm looking for the unit tests that test reserved characters inside a query_string query. Could someone kindly point me in the right direction? Thanks! If you need to use any of the characters which function as operators in your

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread ben
Interesting! If that supports straight lucene syntax then this is golden. Our system must support full lucene syntax along with fuzzy searches which is why I've been using query_string. Thanks! On Thursday, August 21, 2014 10:55:36 AM UTC-7, Ivan Brusic wrote: One more thing! The match query

Re: Can't find unit tests for reserved characters

2014-08-21 Thread ben
More info...trying to figure out why this query throws exception. org.elasticsearch.search.SearchParseException: [example][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [ { query: { query_string: { query: name:exampleof\ bug } } } ]] at

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
Moving this to https://github.com/elasticsearch/elasticsearch/issues/7386 . Its a bug, but I have no idea what caused it. Side note: after digging through the code for two hours I can't find anything that sweeps up files/directories/local shard storage that is unused. I see lots of deletes done

Re: Can't find unit tests for reserved characters

2014-08-21 Thread Clinton Gormley
That's the JSON parsing, not the query_string parsing. You need to use a double slash in JSON in order to pass a single slash, ie: { query: { query_string: { query: name:exampleof\\ bug } } } Also, re the reserved characters in the query string - that is all handled by Lucene,

Parent/Child query performance in version 1.1.2

2014-08-21 Thread Mark Greene
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html)

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
Resolved: https://github.com/elasticsearch/elasticsearch/issues/7386 For posterity: if you nuke the contents of your node's disk after stopping Elasticsearch 1.2 but before starting Elasticsearch 1.3 then you won't end up with too much data that can't be cleared. The more nodes you upgrade the

Re: Files not deleted on upgrade

2014-08-21 Thread Lee Hinman
On 8/21/14, 9:57 PM, Nikolas Everett wrote: Resolved: https://github.com/elasticsearch/elasticsearch/issues/7386 For posterity: if you nuke the contents of your node's disk after stopping Elasticsearch 1.2 but before starting Elasticsearch 1.3 then you won't end up with too much data that

Re: Can't find unit tests for reserved characters

2014-08-21 Thread ben
I was trying to demonstrate the escaping of a space not a slash. According to the ES documentation (copied in my original post) that says a space must be escaped. Thanks! On Thursday, August 21, 2014 12:31:29 PM UTC-7, Clinton Gormley wrote: That's the JSON parsing, not the query_string

store custom properties along with elasticsearch cluster settings

2014-08-21 Thread Srinivasan Ramaswamy
I would like to store custom property in elasticsearch custom settings. Is it possible ? I would like to store it along with cluster settings and retrieve it later. If we used Zookeeper, I can imagine using it to store a cluster level property and i can update it from time to time. -- You

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-21 Thread Adrien Grand
Hi Tony, Do you have more information in the core dump file? (cf. the Core dump written line that you pasted) On Thu, Aug 21, 2014 at 7:53 PM, tony.apo...@iqor.com wrote: Hello, I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to scale out of small x86 machine. I get a similar

Re: Problem with combined nested bool filters (nested key/value matching)

2014-08-21 Thread Drew Kutcharian
I was able to get this to work. The problem was that I had to put the bool filter before the nested filter so now what I have is bool - nested - bool. On Aug 20, 2014, at 9:44 PM, Drew Kutcharian d...@venarc.com wrote: Hey Guys, Seems like there is an issue with a combined bool filter

Optimizing Elasticsearch Searches - from found.no

2014-08-21 Thread Mark Walkom
This is a great read https://www.found.no/foundation/optimizing-elasticsearch-searches/ Elasticsearch can query, filter and aggregate in many ways. Often there are several ways to solve the same problem – and possibly with very different performance characteristics. This article will cover some

Re: how to use my customer lucene analyzer(tokenizer)?

2014-08-21 Thread art
I have the same question about using an analyzer I have written as a plug-in for ElasticSearch 1.3. https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/blob/es-1.3/README.md demonstrates only how to use the tokenizers in combination with the built-in CustomAnalyzer. They do not

Re: Call when shard reallocation occurs

2014-08-21 Thread Ivan Brusic
AFAIK, there is no way to achieve such functionality. The only way I have figured out have similar functionality is to write a plugin with a cluster state listener and have the plugin reach out to some external service. Cheers, Ivan On Thu, Aug 21, 2014 at 10:02 AM, 'Sandeep Ramesh Khanzode'

Re: How to do sequence matching

2014-08-21 Thread Smitha Gowda
Thanks that will work. One more question related to Kibana to visualize this data. For a query that matches sequence AB Once I have all the matching documents I want to plot a bar chart with x-axis: Session StartTime (Day granularity) y-axis: Mean of (LastEvent.EndTime(In this example B) -

Sustainable way to regularly purge deleted docs

2014-08-21 Thread Jonathan Foy
Hello I'm in the process of putting a two-node Elasticsearch cluster (1.1.2) into production, but I'm having a bit of trouble keeping it stable enough for comfort. Specifically, I'm trying to figure out the best way to keep the number of deleted documents under control. Both nodes are