Re: ElasticSearch built-in Jackson stream parser is fastest way to extract fields
Swati, Well, I tend not to use the built-in Jackson parser anymore. The only advantage I've seen to stream parsing is that I can dynamically adapt to different objects in my own code. But I can't release the code since it's owned by my employer. And for most tasks these days, I use the Jackson jar files and the data binding model. By the way, here are the only additional JAR files that I use in my Elasticsearch-based tools that also include Elasticsearch jars: For the full Jackson support. There are later versions but these work for now until the rest of the company moves to Java 8: jackson-annotations-2.2.3.jar jackson-core-2.2.3.jar jackson-databind-2.2.3.jar This gives me the full Netty server (got tired of looking for it buried inside ES, and found this to be very simple and easy to use). Again, there are later versions but this one works well enough: netty-3.5.8.Final.jar And this is the magic that brings Netty to life. My front end simply publishes each incoming Netty MessageEvent to the LMAX Disruptor ring buffer. Then I can predefine a fixed number of background WorkHandler threads to consume the MessageEvent objects, handling each one and responding back to its client. No matter how much load is slammed into the front end, the number of Netty threads stays small since they only publish and they're done. And so, the total thread count stays small even when intense bursts of clients slam the server: disruptor-3.2.0.jar I hope this helps. I'd love to publish more details but this is about all I can do for now. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/773a4516-89c1-4e21-bd65-e5e7bf48c7e4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Consultant Needed for initial setup / audit / tuning /etc.
Looking to see if we have setup the system efficiently and correctly as well as general guidance. I'm looking for someone to provide some initial consulting (not very long) to give my current setup a nice audit and make sure I've set things up efficiently/correctly. I'm having trouble finding anyone online that doesn't want an annual contract. Anyone here available or know of someone? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f1bf3a5-543b-47af-a90d-1971c6cfbb6e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Rebuilding master node caused data loss
I have a cluster with 5 data nodes, and 1 master node. I decided to test a master node failure, and clearly I miss understood exactly what is stored on the master. I turned down the VM running the master node, and built a new one from scratch. I then added it to the cluster as a master. When this came online, I lost all my data that was in cluster previously and it started making new indexes clean again. Now this isn't critical data, this is my test setup, but it still confused me. I have looked into this and it would seem there is a deafult setting for gateway.local.auto_import_dangled. As I understand it, this was put in place for people like me who didn't understand whta would happen if you lost a master node, and should by default have imported the old data from each data node. If this was defaulted to no, and just deleted the data, I would know exactly what happened. I have looked at my configurations and I haven't set this to no, and yet the data was deleted. Can someone clarify if this setting is no longer valid, or if the default hsa been changd and not documented? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bbfc97a6-a775-41f4-b34c-e38d7c2c515d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: shingle filter for sub phrase matching
Did you ever figure this out? I have the same exact issue but using different words. On Wednesday, July 23, 2014 at 10:37:03 AM UTC-4, Nick Tackes wrote: I have created a gist with an analyzer that uses filter shingle in attempt to match sub phrases. For instance I have entries in the table with discrete phrases like EGFR Lung Cancer Lung Cancer and I want to match these when searching the phrase 'EGFR related lung cancer My expectation is that the multi word matches score higher than the single matches, for instance... 1. Lung Cancer 2. Lung 3. Cancer 4. EGFR Additionally, I tried a standard analyzer match but this didn't yield the desired result either. One complicating aspect to this approach is that the min_shingle_size has to be 2 or more. How then would I be able to match single words like 'EGFR' or 'Lung'? thanks https://gist.github.com/nicktackes/ffdbf22aba393efc2169.js -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f480904-aca7-468b-9d43-4243b65899df%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Creating a dynamic_template with a patch_match of arbitrary depth
https://github.com/elastic/elasticsearch/issues/10467 On Friday, April 3, 2015 at 10:36:00 AM UTC-4, Brian Levine wrote: I'm indexing documents with nested objects where some of the objects include unique ids (GUIDS). I want all such fields to be not_analyzed. The id fields always have an '_id' suffix however, these fields can appear at arbitrary levels in the document hierarchy. I'm trying to come up with a dynamic mapping template to address this so that any field of the form *_id regardless of the nesting depth will be marked as not_analyzed. I don't think there's a way to specify this as a single patch_match, but I just wanted to confirm that I'm not missing something. In practice, I suppose the nesting will never go deeper than let's say, 5. So I could define 5 path_match patterns like *_id, *.*_id, *.*.*_id...etc. Although experience shows that the moment I do this, we'll find the need to go to 6 levels ;-). Ideally, you'd be able to specify a path in ANT-like syntax e.g., **/*_id. Maybe I'll write up an enhancement request for this. Thanks. -b -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3242a687-9d76-4b83-a998-69a393cb5a9c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Creating a dynamic_template with a patch_match of arbitrary depth
I'm indexing documents with nested objects where some of the objects include unique ids (GUIDS). I want all such fields to be not_analyzed. The id fields always have an '_id' suffix however, these fields can appear at arbitrary levels in the document hierarchy. I'm trying to come up with a dynamic mapping template to address this so that any field of the form *_id regardless of the nesting depth will be marked as not_analyzed. I don't think there's a way to specify this as a single patch_match, but I just wanted to confirm that I'm not missing something. In practice, I suppose the nesting will never go deeper than let's say, 5. So I could define 5 path_match patterns like *_id, *.*_id, *.*.*_id...etc. Although experience shows that the moment I do this, we'll find the need to go to 6 levels ;-). Ideally, you'd be able to specify a path in ANT-like syntax e.g., **/*_id. Maybe I'll write up an enhancement request for this. Thanks. -b -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f545d11b-3ce0-4e27-9a00-7dc397d34043%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Do I have to explicitly exclude the _all field in queries?
OK, I think I figured this out. It's the space between Brian and Levine. The query: query: Node.author:Brian Levine is actually interpreted as Node.author:Brian OR Levine in which case Levine is searched for in the _all field. Seems so obvious now! ;-) -b On Tuesday, March 24, 2015 at 1:50:42 PM UTC-4, Brian Levine wrote: Hi all, I clearly haven't completely grokked something in how QueryString queries are interpreted. Consider the following query: { query: { query_string: { analyze_wildcard: true, query: Node.author:Brian Levine } }, fields: [Node.author], explain:true } Note: The Node.author field is not_analyzed. The results from this query include documents for which the Node.author field contains neither Brian nor Levine. In examining the the explanation, I found that the documents were included because another field in the document contained Levine. A snippet from the explanation shows that the _all field was considered: { value: 0.08775233, description: weight(_all:levine in 464) [PerFieldSimilarity], result of:, ... Do I need to explicitly exclude the _all field in the query? Separate question: Because the Node.author field is not_analyzed, I had thought that the value Brian Levine would also not be analyzed and therefore only documents whose Node.author field contained Brian Levine exactly would be matched, yet the explanation shows that the brian and levine tokens were considered. I also noticed that if I change the query to: query: Node.author:(Brian Levine) then result set changes. Only the documents whose Node.author field contains either brian OR levine are included (which is what I would have expected). According to the explanation, the _all field is not considered in this query. So I'm confused. Clearly, I don't understand how my original query is interpreted. Hopefully, someone can enlighten me. Thanks. -brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/63c5fff3-cfd5-4003-89ad-015d8fdb4879%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Do I have to explicitly exclude the _all field in queries?
Hi all, I clearly haven't completely grokked something in how QueryString queries are interpreted. Consider the following query: { query: { query_string: { analyze_wildcard: true, query: Node.author:Brian Levine } }, fields: [Node.author], explain:true } Note: The Node.author field is not_analyzed. The results from this query include documents for which the Node.author field contains neither Brian nor Levine. In examining the the explanation, I found that the documents were included because another field in the document contained Levine. A snippet from the explanation shows that the _all field was considered: { value: 0.08775233, description: weight(_all:levine in 464) [PerFieldSimilarity], result of:, ... Do I need to explicitly exclude the _all field in the query? Separate question: Because the Node.author field is not_analyzed, I had thought that the value Brian Levine would also not be analyzed and therefore only documents whose Node.author field contained Brian Levine exactly would be matched, yet the explanation shows that the brian and levine tokens were considered. I also noticed that if I change the query to: query: Node.author:(Brian Levine) then result set changes. Only the documents whose Node.author field contains either brian OR levine are included (which is what I would have expected). According to the explanation, the _all field is not considered in this query. So I'm confused. Clearly, I don't understand how my original query is interpreted. Hopefully, someone can enlighten me. Thanks. -brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/120103bc-a60d-418a-b092-09f71732b682%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic and Kibana, indexing al Json with an field array looks like a plain String.
you happen to figure out a solution for this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f5abacf-e2b5-41e9-b9d0-6515a336fb8a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch throwing an “OutOfMemoryError[unable to create new native thread]” error
We just did a rolling restart of our server, but now every few hours our cluster stops responding to API calls. Instead, when we make a call, I get a response like this: curl -XGET 'http://localhost:9200/_cluster/health?pretty' { error : OutOfMemoryError[unable to create new native thread], status : 500 } I noticed that we can still index data fine, it seems, but cannot search or call any API functions. This seems to happen every few hours, and the most recent time it happened, there was no logs in any of the node's log files. Our cluster is 8 nodes over 5 servers (3 servers run 2 elasticsearch processes, 2 run 1), running RHEL6u5. We are running Elasticsearch 1.3.4. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3e19365-d993-4f38-9857-a0a709546165%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How does sorting on _id work?
Yes, the _id field is a string. You are not limited to numbers. In fact, an automatically generated ID has many non-numeric characters in it. For what you want, you should create an id field, map it to a long integer, and then copy your _id into that id field when you load the document. Then when you sort on the id field, you will get a numeric sort. Hope this helps. Brian On Tuesday, January 27, 2015 at 1:28:44 PM UTC-5, Abid Hussain wrote: ... can it be that _id is treated as string? If so, is there any way retrieve the max _id field with treating _id as integer? Am Dienstag, 27. Januar 2015 19:24:41 UTC+1 schrieb Abid Hussain: Hi all, I want to determine the doc with max and min _id value. So, when I run this query: GET /my_index/order/_search { fields: [ _id ], sort: [ { _uid: { order: desc } } ], size: 1 } I get a result: { ... hits: { ... hits: [ { _index: my_index, _type: order, _id: 99, _score: null, sort: [ order#99 ] } ] } } There is definitevely a doc with _id value 11132106 in index which I would have expected as result. And, when I run the same search with *order asc* I get a result with _id 100 which is higher than 99...? What am I doing wrong? Regards, Abid -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7842cdcf-67ee-48ed-8ef6-e8be2bb63a4a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to find all docs where field_a === val1 and field_b === val2?
By the way, David, the full query follows: { from : 0, size : 20, timeout : 6, *query* : { *bool* : { *must* : [ { match : { field_a : { query : val1, type : boolean } } }, { match : { field_b : { query : val2, type : boolean } } } ] } }, version : true, explain : false, fields : [ _ttl, _source ] } Also note that since the _ttl field is being requested (always), then the _source must also be asked for explicitly. If you don't ask for any fields, _source is returned by default. But if you ask for one or more fields explicitly, then you must also ask for _source or it won't be returned. Brian On Wednesday, January 14, 2015 at 6:31:29 PM UTC-5, Brian wrote: David, This is what I use. I hope it helps. { *bool* : { *must* : [ { match : { field_a : { query : val1, type : boolean } } }, { match : { field_b : { query : val2, type : boolean } } } ] } } Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ee28beb0-4ee6-463d-8891-a2d158e00934%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to find all docs where field_a === val1 and field_b === val2?
David, This is what I use. I hope it helps. { *bool* : { *must* : [ { match : { field_a : { query : val1, type : boolean } } }, { match : { field_b : { query : val2, type : boolean } } } ] } } Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/64aa8c31-2002-4432-bd7f-5c9bba3184da%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Indices Stats using the NodeClient with the Java API
Marc, Maybe these snippets will help? The enclosing class's constructor sets the client data member to either a Node client or Transport client (both work fine; I prefer the TransportClient). The source string contains one or more index names, with a comma between each pair of names. A name may contain wildcards as supported by ES. This code works for 1.3.4. Not sure if 4.X has yet another breaking change, but if that's the case, it is usually no big deal to handle. *public static String[] parseIndex(String source)* { return indexSplitter.split(source); } *public String[] getIndexNames(String indexPattern) throws UtilityException* { if (indexPattern.trim().isEmpty()) throw new UtilityException(Cannot resolve empty index pattern [ + indexPattern + ]); try { /* * Parse the index pattern on commas, if present. Then pass the list of * individual names (which may include wildcards and - signs) to * Elasticsearch for final resolution */ String[] indexSpecList = parseIndex(indexPattern); /* * Get the list of individual index names, along with their status * information (which we will ignore in this method) */ IndicesAdminClient iac = client.admin().indices(); RecoveryRequestBuilder isrb = iac.prepareRecoveries(); isrb.setIndices(indexSpecList); RecoveryResponse isr = isrb.execute().actionGet(); /* Create an array of just the names of the indices */ ArrayListString indices = new ArrayListString(); MapString, ListShardRecoveryResponse sr = isr.shardResponses(); for (String index : sr.keySet()) { indices.add(index.trim()); } /* Be sure there is at least one index that matches the pattern */ if (indices.isEmpty()) throw new UtilityException(Cannot resolve index pattern [ + indexPattern + ] to at least one existing index); /* Convert to String[] and return */ return indices.toArray(new String[indices.size()]); } catch (ElasticsearchException e) { throw new UtilityException(Cannot resolve index pattern [ + indexPattern + ]: + e); } } private final Client client; private static final Pattern indexSplitter = Pattern.compile(Pattern .quote(,)); Brian On Tuesday, January 13, 2015 at 6:56:07 AM UTC-5, Marc wrote: Hi, I would like to get a list of the available indices in my cluster using the java api using the node client. Currently the request is done via the REST interface similar to this: http://localhost:9200/logstash-*/_stats/indices Cheers Marc -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49ed97eb-6c2f-4c56-a551-ebc86c9559ca%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Input file with custom delimiter
Gopi, You really have a CSV file but using ^ instead of , as your delimiter. I happen to write my own CSV-to-JSON converter, giving it the options I needed (including specification or auto-detection of numbers, date format normalization, auto-creating of the action and meta data line, and so on). I did this before stumbling across logstash, but still found it easier to write and maintain this code myself. Choose the language you wish: I wrote one version of mine in C++ but the subsequent version in Java. I also wrote a bulk load client in Java to avoid the limitations of curl (and also its complete lack of existence on various platforms). (logstash is much better for log files; my converter is much better for generic CSV) I know this isn't exactly the pre-written tool you are looking for. But converting the CSV (with the option to override the delimiter values) into JSON isn't very hard to do. And once that's done, it's an easy matter to add the action and meta data and have a bulk-ready data stream. Brian On Wednesday, January 7, 2015 6:40:34 AM UTC-5, Gopimanikandan Sengodan wrote: Hi All, We are planning to load the data to elastic search from the delimited file. The file has been delimited with 0x88(ˆ) delimiter. Can you please let me know how to load the delimited file to Elastic? Also, Please let me know what is the best and fastest way to load the millions of data to Elastic search? SAMPLE: XˆYYˆ Thanks, Gopi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0e6be2e-d94c-4538-89d6-d7afdb6945af%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Input file with custom delimiter
I wish I could, but currently prohibited. However, I can point you to some very good Java libraries: The CSV parser supplied by the Apache project works well: https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html You can override the delimiter using the static CSVFormat newFormat(char delimiter) method which creates a new CSV format with the specified delimiter: https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html Then use the XContentBuilder cb = jsonBuilder() method call to create a content builder to convert your records to single-line JSON. For example, the action and meta data object I use is based on the following ENUM and toString method to emit as JSON. I've left out the parst that I use in other custom libraries that allow Java code to easily set up this information, and also to set this from a search response or a get-by-id response: public enum OpType { CREATE, INDEX, DELETE } @Override public String toString() { try { XContentBuilder cb = jsonBuilder(); cb.startObject(); cb.field(opType.toString().toLowerCase()); cb.startObject(); cb.field(_index, index); cb.field(_type, type); if (id != null) cb.field(_id, id); if (version 0) { cb.field(_version, version); if (versionType == VersionType.EXTERNAL) cb.field(_version_type, external); } if (ttl != null) cb.field(_ttl, ttl); cb.endObject(); cb.endObject(); return cb.string(); } catch (IOException e) { return (null); } } /* Operation type (action): create or index or delete */ private OpType opType = OpType.INDEX; /* Metadata that this object supports */ private String index = null; private String type= null; private String id = null; private longversion = 0; private VersionType versionType = VersionType.INTERNAL; private TimeValue ttl = null; And the actual data line that would follow is similarly constructed using the content builder. I wish I could help you more. Brian On Wednesday, January 7, 2015 10:41:26 AM UTC-5, Gopimanikandan Sengodan wrote: Thank you brian. Let me change it accodingly as per your suggestion. Could it possible to share the bulk load client and csv to json converter? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d46f746-04c6-48fe-93bc-a0c8612539ca%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Incompatible encoding when using Logstash to ship JSON files to Elasticsearch
We use the HTTP protocol from logstash to send to Elasticsearch, and therefore we have never had this issue. There is a version of ES bundled with logstash, and if it doesn't match the version of ES you are using to store the logs then you may see problems if you don't use the HTTP protocol. Brian On Wednesday, December 10, 2014 3:53:30 PM UTC-5, Vagif Abilov wrote: Thank you Aaron, done. I've created an issue. But I'd like to find out if there's a workaround for this problem. What's really strange that the same Logstash installation works with similar JSON files on other machines. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a86923e8-8e9f-429e-b85e-8ab8f7ab20d2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query doesn't find results
I believe that the query_string has its own syntax that assumes tokenization and other preprocessing. Maybe if you added name: to the query? But I am not sure how you would tell the query string that your phrase is really one token. But, thanks for giving me one more reason to avoid Spring! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c6b4f8c-4600-435b-93f1-db62c807cf4f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Cannot find elasticsearch sample data in kibana4 beta 2
I'm having some difficulties getting some non-logstash data to show up in kibana4. All logstash data works fine. I loaded up the french data as suggested on the elasticsearch help page (http://www.elasticsearch.org/help) and everything works as far as elasticserach is concerned. I can successfully load, map, and query the data from the CLI. In Kibana4, I can add the index and it reads all of the fields. The time-field-name gives 3 options as expected (matches the mapping). I chose date_creation (2012-06-21 05:46:59) and then search for the data in Kibana with no success. Just for kicks I loaded up kibana3 and I'm able to see the data with no date filtering. I changed the time_picker field to date_creation and again search for the data in 2012. Notta. Once I select a timeframe the data no longer appears. Doest this seem like an elasticsearch mapping issue or something in Kibana. Thanks in advance for any thoughts you may have. SOFTWARE elasticsearch 1.4.0 Beta 1, Build 3998 kibana 4.0.0 Beta 2 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ae84f60-ac79-46c1-af93-9c57af594c63%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cannot find elasticsearch sample data in kibana4 beta 2
Upgraded to elasticsearch 1.4.1 - no change On Wednesday, December 3, 2014 12:53:42 PM UTC-5, Brian Olson wrote: I'm having some difficulties getting some non-logstash data to show up in kibana4. All logstash data works fine. I loaded up the french data as suggested on the elasticsearch help page ( http://www.elasticsearch.org/help) and everything works as far as elasticserach is concerned. I can successfully load, map, and query the data from the CLI. In Kibana4, I can add the index and it reads all of the fields. The time-field-name gives 3 options as expected (matches the mapping). I chose date_creation (2012-06-21 05:46:59) and then search for the data in Kibana with no success. Just for kicks I loaded up kibana3 and I'm able to see the data with no date filtering. I changed the time_picker field to date_creation and again search for the data in 2012. Notta. Once I select a timeframe the data no longer appears. Doest this seem like an elasticsearch mapping issue or something in Kibana. Thanks in advance for any thoughts you may have. SOFTWARE elasticsearch 1.4.0 Beta 1, Build 3998 kibana 4.0.0 Beta 2 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d83b780-bca3-4e8f-ba74-e7e96370cf27%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: tweezer fixes to status-red don't work, may need sledgehammer
Just a wild guess, but it seems that the /etc/init.d/elasticsearch restart command will, if properly named, stop a currently running instance and then start it. If you issue the curl _shutdown command and then the restart command directly after without any delays, then perhaps that double blow from your sledgehammer is causing some corruption. In general, it's not good to mix HTTP REST (curl) commands and scripts that directly handle processes without adequate delays to ensure they aren't hammering on each other. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2679372-8184-4578-a9f6-a88dfe31ad42%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Spark StreamingContext read streaming updates from ES
Existing Spark support allows us to read or write to ES. Read support is to one shot, that is, read what ES has in it's index now. I'd like to have a Spark thread read streaming updates from ES, using it as a source not a sink. I was wondering if there was a way to write a spark StreamingContext that will observe updates to ES? Something like ssc.elasticSearchStream(...) Thanks for your time. -b -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c4b5844-2648-40ee-8769-cd9c40456947%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Question about Logstash Joining ES Cluster and Index
I highly recommend that you use the HTTP output. Works great, is immune to the ES version, and there are no performance issues that I've seen. It Just Works. For example, here's my sample logstash configuration's output settings: output { # Uncomment for testing only: # stdout { codec = rubydebug } # Elasticsearch elasticsearch { # Specify http (with or without quotes around http) to direct the # output as JSON documents via the Elasticsearch HTTP REST API protocol = http codec = json manage_template = false # Or whatever target ES host is required host = localhost # Or whatever _type is desired: index_type = sample } } As you can probably surmise, I have my own default index creation template so there's no need to splatter it all over creation; logstash runs better on the host on which it's gathering the log files and I vastly prefer one central index template than keeping a bazillion logstash configurations in perfect sync. And if we happen replace logstash for something else, then I still have my index creation templates. Hope this helps! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27854489-1f4d-4ebd-883c-64dc6235eed4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Import java transportClient only
Filip, Or, just put all of the Elasticsearch jars on your local client system, then add their containing directory (with /* appended to it) to your -classpath, and your client can use the TransportClient. Java will pull in exactly what it needs and nothing it doesn't. And your client code stays tiny. Works great for us! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fe6afa7-6ff0-4ed6-8792-1909f3ecc3d2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Import java transportClient only
David, On each machine on which either ES or a client is deployed, we have the following directory which contains all of the jars that are packaged with ES: /opt/db/current/elasticsearch-1.3.4/lib Then the java command's -classpath includes /opt/db/current/elasticsearch-1.3.4/lib/* (along with our own custom jars via /opt/db/lib/*) and everything works fine. As for additional 3rd party jars, I have the following: 1. Jackson. The full library is used instead of the one inside ES. 2. Netty. This was needed for my own REST API which hides ES and contains the business logic. I couldn't figure out how to easily use the shaded version inside ES, and the real Netty is as easy to use as falling off a log. 3. The LMAX Disruptor .jar file. This thing combines nicely with Netty and wow! Netty and application thread counts remain low even under heavy loads. Everything else I get directly from ES. And I love the way it shades its versions of Netty and Jackson so it's very easy for my own app to cherry pick what it wants from ES and what it prefers outside of ES. We could use maven, I suppose, but we don't. Instead, we package all of the jars into a zip archive after our application is built against a specific ES version. And then that single self-contained zip archive is installed where it is needed. And there is no need for an external or internal maven repo. Not a big deal for us. All in all, it's much like how Elasticsearch itself is packaged and distributed: A zip archive that I download from the web site. I would never use a .deb or .rpm since the version that I want is always on the web site. And I believe there is a maven repo but the .zip archive links are right on the web site, and we don't update all that often (regularly, but I don't thrash our deployment folks). It sounds complicated, I suppose. But that was only once, and it's been easy to manage and develop against, easy to deploy, and makes me look very, very good to our deployment folks. Brian P.S. I don't use Guice or Spring. I don't see any problem with the new operator, and the services I create are fast, rock-solid, easy to configure and deploy, and that puts me light-years ahead of much of the pack. But this is another topic altogether! :-) On Saturday, November 15, 2014 12:24:28 AM UTC-5, David Pilato wrote: Hi Brian, I think I'm missing something. At the end you still have the full elasticsearch jars, right? What is the difference with having that as a maven dependency? Is it a way for not getting all elasticsearch dependencies which are shaded in elasticsearch jar such as Jackson, Guice,... ? David -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01454216-da84-49c0-85e6-62efe4ad535d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: mapping store:true vs store:false
Especially when feeding log data via logstash, I have never used store:true and have found no need to specify it at all. The logstash JSON will be stored as the _source and retrieved by the query so there is no need to use store at all. Anyway, that's my experience. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46a13fd5-5469-4aab-accd-60a33b27a096%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Question on stemming + synonyms and tokenizerFactory
Once you have your mapping set up, then create an application that itself constructs the analyzer you need. Then feed it your real words and let it generate the stemmed versions. I don't think that ES can be told to do this; but it provides the classes you need to do it yourself. For my own synonym processing, I do a Very Bad Thing. I create a synonym _type and then each document contains a list of words or phrases that are synonyms of each other. For a synonym query, I first query my synonym type. Then I OR the queries for each of the matching synonym words or phrases. This is also much easier to maintain: I can update the synonyms on the fly and do not need to reindex the data at all. Not at all. But it requires additional code, and it works best using the Java API. And some folks have indicated there are serious performance issues making this a Bad Solution. But I have not seen any problems with performance. Oh, and all my words and phrases can be fully spelled out; it's only when they are used in the subsequent query do they get analyzed (tokenized, stemmed, and whatever else). Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e5a984d2-4f30-4e78-b1ba-1dc27febdfd3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Disabling dynamic mapping
This is what I put into the elasticsearch.yml file when I start Elasticsearch for use in a non-ELK environment: # Do not automatically create an index when a document is loaded, and do # not automatically index unknown (unmapped) fields: action.auto_create_index: false index.mapper.dynamic: false And here's a complete example of a curl input document that I use to create an index with the desired types in which I don't want new indices, new types, or new fields to be automatically created: { settings : { index : { number_of_shards : 1, analysis : { char_filter : { }, filter : { english_snowball_filter : { type : snowball, language : English } }, analyzer : { english_standard_analyzer : { type : custom, tokenizer : standard, filter : [ standard, lowercase, asciifolding ] }, english_stemming_analyzer : { type : custom, tokenizer : standard, filter : [ standard, lowercase, asciifolding, english_snowball_filter ] } } } } }, mappings : { _default_ : { dynamic : strict }, person : { _all : { enabled : false }, properties : { telno : { type : string, analyzer : english_standard_analyzer }, gn : { type : string, analyzer : english_standard_analyzer }, sn : { type : string, analyzer : english_stemming_analyzer }, o : { type : string, analyzer : english_stemming_analyzer } } } } } By the way, I never mix indices that are used for more standard database queries with the indices used by the ELK stack. Those are two separate Elasticsearch clusters entirely; the former is locked down as shown above, while the latter is left in its default free form method of automatically creating indices and new fields on the fly, just as Splunk and ELK and other log analysis tools do. I hope this helps. Brian On Monday, November 10, 2014 10:45:38 AM UTC-5, pulkitsinghal wrote: What does the json in the CURL request for this look like? The dynamic creation of mappings for unmapped types can be completely disabled by setting *index.mapper.dynamic* to false. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html#mapping-dynamic-mapping Thanks! - Pulkit -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b257ef5-3b87-43fe-a64b-1114da64d671%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES 1.3.4 scrolling never ends
A while back, I wrote my own post-query response sorting so that I could handle cases that Elasticsearch didn't. One case was sorting a scan query. I used a Java TreeSet class and could also limit it to the top 'N' (configurable) items. It is very, very quick, pretty much adding no overhead to the existing scan logic. And it supports an arbitrarily complex compound sort key, much like an SQL ORDERBY statement; it's very easy to construct. Probably not useful for a normal user query, but it is very useful for an ad-hoc query in which I wish to scan across an indeterminately large result set but still sort the results. One of these days, it might make a good plug-in candidate. But I am not sure how to integrate it with the scan API, so for now it's just part of the Java client layer. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/74e311f5-ae54-4da1-9369-567e7bf03272%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES cluster become red
Moshe, Exactly! What you might wish to do is add a Wait for Yellow query before doing any queries, or a Wait for Green request before doing any updates. That way, you can deterministically wait for the appropriate status before continuing. For example: Loop on the following until it succeeds, some timeout expires after repeatedly catching NoNodeAvailableException, or else some other serious exception is thrown: client.admin().cluster().prepareHealth().setTimeout(timeout) .setWaitForYellowStatus().execute().actionGet(); Hope this helps! Brian On Sunday, November 9, 2014 8:22:58 AM UTC-5, Moshe Recanati wrote: Update After couple of seconds or minutes the cluster became green. I assume this is after ES stabilized with data, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecd2f91b-c322-4df8-b411-d47f59f356a4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: get list of all the fields in indextype of index using java api
I do this by getting the mappings for a specific index, then isolating by type if desired. This takes care of all explicitly mapped fields, and also any automatically detected and mapped fields. Especially in the latter case, it's a good way to check and see if Elasticsearch is guessing your automatically mapped fields the way you expect. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a1211d2e-530a-4b1a-8b8f-e81cfaf4293f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES 1.3.4 scrolling never ends
You need to get the scroll ID from each response and use that one in the subsequent scan search. You cannot simply reuse the same scroll ID. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1f23ca4-13e6-4d1e-ad01-2cbda2810c94%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
incomplete results?
hi elasticsearch noob here, so forgive me if i get terminology wrong etc. basically i'm loading a bunch of documents into an elasticsearch index, currently sitting at 17,000. one of the fields i'm using is called md5 and its not_analyzed in the mapping (also tried without to solve this problem, to no avail) when looking at the data in kibana, i've added a panel, looking for the topN md5s and can see they have various values. however when i select one of those (using the search icon), the number is actually higher than what was originally displayed in the panel (it was 10, selecting the top md5 actually shows 15..). I've tired copying the exact queries from the 'inspect' options (showing all and the individual md5) and running that against elasticsearch using curl and the same results show up I've tried renaming the md5 field to md5_hash and the same problem occurs. i would appreciate any insight as to what may be happening here as i've tried everything i can think of .. - brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4369cec8-4b8c-48ec-8d0a-a59e62e20c1c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Differences about label your fields with or without @ in Kibana
The @timestamp field, created by logstash by default, has always worked perfectly out-of-the-box with Kibana's time picker and also with curator. Perhaps if you posted one document from your Elasticsearch response it might help. But I don't recommend that you create your own fields with @ as a prefix character. Straying a bit from your question, I created some R scripts to analyze and plot things in a way that neither Kibana nor Splunk can. What I've noticed is that when I export as CSV, either from Elasticsearch or from Splunk, and then import into R's CSV reader, I notice that: 1. Elasticsearch's @timestamp field becomes the X.timestamp field in R. 2. Splunk's _time field becomes the X_time field in R. Which is one very good reason not to add a @ or _ to the front of your own fields. It's a lot of extra hard-coded processing to figure out the source and then choose the field using R when it's not the same name as the field from Elasticsearch. But I digress. Brian On Wednesday, October 29, 2014 1:20:10 PM UTC-4, Iván Fernández Perea wrote: I was using Kibana and wondering which are the differences between using or not an @ sign before field names. It seems that the default (as in timepicker in the dashboard settings) is using the @ before a field but it doesn't seem to work in my case. I need to set the Time Field in the Timepicker with a field name and no @ before it to make it work. Thank you, Iván. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9897dd1d-9306-4f73-bcbd-fba65c5f4d8e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help to create array type field in elastic search
There is nothing special you need to add to your mapping to enable multiple values for a field. Just pass in an array of values instead of a single value, and all of the values are analyzed. One thing you might want to add for string fields with multiple values: position_offset_gap : *n* When a string field is analyzed, it typically assigns a position to each token that is one greater than the position of the previous token. By setting a position offset gap value to *n*, it skips ahead that many positions, representing the number of non-matching word positions between consecutive values. What this does is that if your field contains multiple values that each has multiple words, a phrase query won't span across values unless the slop value is large enough (n or larger, I seem to recall). Hope this helps. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2d2dd14-d899-4e1e-a909-ce9e305f900a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: snapshot/restore
It looks like it was first introduced in: 1.0.0.Beta2 http://www.elasticsearch.org/downloads/1-0-0-beta2/ December 2, 2013 - Snapshot/Restore API – Phase 1 #3826 http://github.com/elasticsearch/elasticsearch/issues/issue/3826 This preceded 0.90.9, so I would suspect that it's in your 0.90.10 version as well. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/acf61715-39aa-40c8-ba63-3db4f17667c9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Logstash into Elasticsearch Mapping Issues
I haven't ever let logstash set the default mappings. Instead, whenever a logstash-style index is created, I let Elasticsearch set the default mappings from its template. That way, it works even if I replace logstash with something else. For example, with my $ES_CONFIG/templates/automap.json file is the following { automap : { template : logstash-*, settings : { index.mapping.ignore_malformed : true }, mappings : { _default_ : { numeric_detection : true, _all : { enabled : false }, properties : { message : { type : string }, host : { type : string }, UUID : { type : string, index : not_analyzed }, logdate : { type : string, index : no } } } } } } And since logstash stores the entire message within the message field and I never modify that particular field, the _all field is disabled and Elasticsearch is told to use the message field as the default within a Kibana query via the following Java option when starting Elasticsearch as part of the ELK stack: -Des.index.query.default_field=message I hope this helps! Brian On Thursday, October 2, 2014 9:02:17 PM UTC-4, elo...@gmail.com wrote: Anyone have an idea what to do in a situation where I am using the output function in logstash to send it to an Elasticsearch cluster via protocol http and using a JSON templateand the mappings in the JSON template aren't being used in the elasticsearch cluster. logstash.conf input { tcp { port = 5170 type = sourcefire } } filter { mutate{ split = [message, |] add_field = { event = %{message[5]} eventSource = %{message[1]} } } kv { include_keys = [dhost, dst, dpt, shost, src, spt, rt] } mutate { rename = [ dhost, destinationHost ] rename = [ dst, destinationAddress ] rename = [ dpt, destinationPort ] rename = [ shost, sourceHost ] rename = [ src, sourceAddress ] rename = [ spt, sourcePort ] } date { match = [rt,UNIX_MS] target = eventDate } geoip { add_tag = [ sourceGeo ] source = src database = /opt/logstash/vendor/geoip/GeoLiteCity.dat } geoip { add_tag = [ destinationGeo ] source = src database = /opt/logstash/vendor/geoip/GeoLiteCity.dat } } output { if [type] == sourcefire { elasticsearch { cluster = XXX-cluster flush_size = 1 manage_template = true template = /opt/logstash/lib/logstash/outputs/elasticsearch/elasticsearch-sourcefire.json } } } JSON Template { template: logstash-*, settings: { index.refresh_interval: 5s }, mappings: { Sourcefire: { _all: { enabled: true }, properties: { @timestamp: { type: date, format: basicDateTimeNoMillis }, @version: { type: string, index: not_analyzed }, geoip: { type: object, dynamic: true, path: full, properties: { location: { type: geo_point } } }, event: { type: string, index: not_analyzed }, eventDate: { type: date, format: basicDateTimeNoMillis }, destinationAddress: { type: ip }, destinationHost: { type: string, index: not_analyzed }, destinationPort: { type: integer, index: not_analyzed }, sourceAddress: { type: ip }, sourceHost: { type: string, index: not_analyzed }, sourcePort: { type: integer, index: not_analyzed } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed3eba42-7142-4b9a-8334-8463f519c9bc%40googlegroups.com. For more options, visit https
Re: Logstash into Elasticsearch Mapping Issues
I also have the following Logstash output configuration: output { # For testing only stdout { codec = rubydebug } # Elasticsearch via HTTP REST elasticsearch { protocol = http codec = json manage_template = false # Or whatever target ES host is required: host = localhost # Or whatever _type is desired: Usually the environment name # e.g. qa, devtest, prod, and so on: index_type = sample } } Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e05143d2-a2fd-4365-932b-b4603b08165c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana 3.1.1
Link went away (404); now it's back but still no release notes... On Thursday, October 2, 2014 11:05:16 AM UTC-4, Brian wrote: Looks interesting. But no release notes? http://www.elasticsearch.org/downloads/kibana-3-1-1/ Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2ecba78-0ff6-48c0-bd97-1b09c34f899f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana 3.1.1
Looks interesting. But no release notes? http://www.elasticsearch.org/downloads/kibana-3-1-1/ Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7980fb38-ce7c-44f3-9630-182184d76f08%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: search for available fields
In that case, your strategy seems fine. ES has already done all the real work of creating the responses, and I would expect that iterating across them and gathering the fields into a Set should be rather quick. However, you still might wish to get the mappings for the index. Why? Because once you've collected the subset of fields within your current response, you still can only search on the ones that are indexed. So for a general solution, you would perhaps want to skip over fields that are stored in the documents but not indexed. Brian On Tuesday, September 30, 2014 6:01:20 PM UTC-4, shooali wrote: Thank you Brian, However, I am looking for something slightly different. I don't want to know all the fields for an index, I want to know for a certain subset of the documents I have indexed, what are the relevant fields that I can continue search on. For example: If I have 5 fields total for my index, Then I search for all document that satisfy certain criteria, for example, all documents that FieldA equals '5'. Only for those, what are the available fields that I can continue on search on... The way I thought I can do this is to go over all result of the first search and collect all fields of these documents into a Set and use those. My question is whether there is a better/more performant way to achieve this goal. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ae1168c-2526-4ba6-937a-1f2b1bc90a0b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Pagination in elasticsearch
The setSize method specifies the maximum number of responses per shard. So if you query an index with *N* shards and your query sets a size of *S* then the query will return a maximum of up to *N* x *S* response hits. Since you writing this in Java, it's a relatively easy matter to make further adjustments on the response, limiting the response to the page size you expect. Brian On Thursday, September 25, 2014 6:56:44 PM UTC-4, Malini wrote: I have SearchRequestBuilder srb = client.prepareSearch(cs).setTypes(csdl); srb.setSearchType(SearchType.DFS_QUERY_AND_FETCH); QueryBuilder qb = (QueryBuilder) QueryBuilders.matchQuery(title, searchText); FilterBuilder fb = FilterBuilders.andFilter ( FilterBuilders.termsFilter(elasticdb, searchDB), //get from date and to date FilterBuilders.rangeFilter(pubdate).gte(1890-09).lte(2014-08) ); FilteredQueryBuilder builder = QueryBuilders.filteredQuery(qb,fb); FunctionScoreQueryBuilder functionbuilder = new FunctionScoreQueryBuilder(builder) .add(FilterBuilders.termsFilter(category, acm), factorFunction(-30.0f)); srb.setQuery(functionbuilder).setFrom(0).setSize(1); SearchResponse response = srb.execute().actionGet(); SearchHit[] results = response.getHits().getHits(); Even though I set from=0 and size = 1 ( to see only one result) I see more than 1 results. How do we get this pagination working? Thanks in advance -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00c324bd-fad1-404c-8b28-4274ffe11c24%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Searching on _all vs bool query on a (large) number of fields
I would not say Diabolical. Perhaps not optimal based on Lucene's internal design. But I do something similar with table-based synonyms. In other words, when matching a synonym of a word, I do not pre-build the database index with synonyms. Instead, I maintain a table (index/type) of words and their synonyms, query that table, retrieve the synonyms, and then create the second and final query that basically does an OR search across the word and its synonyms. (It's basically a group of should clauses, just like yours). I find that performance is fine. And accuracy and usefulness is superior. For example, a user query for synonym of the wild-carded BIG* might find BIG, LARGE, HUGE and also BIGHORN, SHEEP. And so on; some of the synonym lists are rather long and with multiple words there are many should terms in the final query. And even with the multiple queries (first to resolve the synonyms, and the second to OR across them), performance is remarkably fast. It might be pushing Lucene a little, but I like the improved accuracy, and the ability to easily and regularly modify my synonym lists without any need to rebuild the hundreds of millions of documents that I am querying. So for your question, my suggestion is to go for it and it should perform well enough. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d6ba249-c8b6-4870-af96-ed71ee1b2f7e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Checking for tampering of indices
In Splunk, it is possible to detect tampering of logs. Splunk will take an event at ingestion time and create a hash value based on the event and your certificates/keys. You can then write searches that will re-hash the event to be compared to the original to indicate if anything has changed. We need something like that. How is that possible with elasticsearch? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3b724745-88ac-4484-9d21-284ec28697a9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana server-size integration with R, Perl, and other tools
Lance, Thanks for the clarification. Yeah, the consensus seems to be to either issue the same REST command off-line (not available to Windows PMs, since I am not going to touch Windows with a pole shorter than 25m :-), or to write a server plug-in (would allow even Windows users to invoke the scripts). But one question: When I click on the Info button near the upper right of a panel, it shows the JSON request as invoked by curl. But that's only a suggestion, right? In other words, my browser is not using curl? I've run into issues with curl's buffer limitations with large queries, and am hoping that Kibana is only giving me a suggestion to use curl, but isn't telling my browser to use curl. Brian On Friday, September 26, 2014 2:51:38 PM UTC-4, Lance A. Brown wrote: On 2014-09-25 11:57 am, Brian wrote: And as my part of the bargain, I will use Perl, R, or whatever else is at my disposal to create custom commands that can run on the Kibana host and perform all of the analysis that our group needs. Something to remember: The Kibana host is your browser. The current version of Kibana run entirely within the browser, making calls to Elasticsearch for data, processing it and generating graphs all within the browser. There is no server-side operating component, just static files that get loaded into your browser. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9dbdb62e-7473-4d71-81f8-3ed27e90c2fe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch blocked futex
Chris, This sounds very suspiciously like a problem we had. We set up an experimental local ELK server (one node in the cluster) and fed it with logstash. I was manually cleaning up older data using the Elasticsearch Head plug-in, but over one weekend the cluster got into a funky state. The curl API said it was Yellow, but ES Head showed Green, and queries were hanging. This was a VM that was dedicated to ES with 1TB disk space (only about 2% was ever used at any point in time), 4 CPUs, and 24GB RAM (though the Java JVM was not tuned to take advantage of all of this memory). Kibana was hosted as a site plug-in, but its usage was very light. Though I had been playing around with increasing the size limit of responses way past the default of 500, and I'm sure the ES server bore the brunt of that. I stopped and restarted ES and everything went back to normal. I installed Curator to clean up older indices automatically, and the problem has never returned. (I have also stopped telling Kibana to ask for up to 5 response documents on a query!) I suspect you're getting some sort of OOM condition and that's when things start looking odd. Anyway, OOM is just a wild guess. I wouldn't have mentioned something so nebulous, but the symptoms you have are strikingly close to the ones we saw. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccf8ab3d-c89c-42da-95ba-1b25198fc445%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: search for available fields
The following query will return all of the mappings for all of the indices on the specified ES host: $ curl -XGET 'http://*hostname*:9200/_all/_mapping?pretty=true' echo You can read the JSON, or else parse it and extract the details you need. For example, if you have automatic mapping enabled, this is a very good way to not only discover the searchable fields but also see if they are strings or numbers. Brian On Tuesday, September 30, 2014 11:15:32 AM UTC-4, shooali wrote: Hi, What is the most efficient way to get all available fields to search on for a preliminary search criteria? Thanks, Shooali -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95cd8e08-bba6-451c-96c5-8e3c6a1e1fff%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How does logstash chose which timestamped index to use?
Matt, Assuming your logstash configurations correctly set the @timestamp field, then logstash will store the document in the day that is specified by the @timestamp field. I have verified this behavior by observation over the time we have been using the ELK stack. For example, we have a Perl CGI script that is used to emulate a customer service. It has a hard-coded ISO-8601 date string which our logstash configuration finds before it notices the syslog date. And so that log entry ends up in the day in the past that the hard-coded string specifies. And then curator cleans it up each and every day. Bottom line: logstash already respects the day in the @timestamp when storing data in ES. Brian On Tuesday, September 30, 2014 2:31:59 PM UTC-4, Matt Hughes wrote: I have a logstash-forwarder client sending events to lumberjack - elasticsearch to timestamped logstash indices. How does logstash decide what *day* index to put the document in. Does it look at @timestamp? @timestamp is just generated when the document is received, correct? So if you logged an event on a client at 11 pm UTC but it didn't make it to elasticsearch until 1am UTC the next day, which index would it go in? Would it go in the day it was created or would it go in the day it got to elasticsearch? If the latter, is there a way to force logstash to respect a date field in the original log event? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3931b0d7-6923-4dce-a524-33b49d04af01%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana server-size integration with R, Perl, and other tools
Thanks, Jörg. I will need to find some time to look into this, as it seems exactly like what I was looking for. Thanks again! Brian On Monday, September 29, 2014 12:21:00 PM UTC-4, Jörg Prante wrote: It is quite easy to add a wrapper as a plugin in ES in the REST output routine around search responses, see https://github.com/jprante/elasticsearch-arrayformat or https://github.com/jprante/elasticsearch-csv If the CSV plugin has deficiencies, I would like to get feedback what is missing/what can be added. With a bit of hacking, it is possible to write ES plugin(s) that can trigger the creation of graphviz, gnuplot, R etc. plots instead of delivering temporary CSV files. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/02166e89-b8ad-4778-800a-77e6d01dc8ac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana server-size integration with R, Perl, and other tools
Ash, JSON is a natural for Kibana's Javascript to read and therefore emit as CSV. So what I really was asking is Kibana going to become a serious conteder and allow user-written commands to be inserted into the pipeline between data query/response and charting. After my few weeks with R, I have gotten it to far exceed GNUPlot for plotting (even with the base plotting functions; I haven't yet dived into ggplot2 package), and to also far exceeds Kibana. For example, setting up a custom dashboard is tedious, and it's not easily customizable. Now, I am not suggesting that the ELK stack turn into Splunk directly. But since it wants to become a serious contender, I an am strongly recommending that the ELK team take the next step and allow a user-written command to be run against the Kibana output and its charting. And I recommend that the output be CSV because that's what R supports so naturally. And with R, I can build out custom analysis scripts that are flexible (and not hard-coded like Kibana dashboards). For example, I have an R script that gives me the most-commonly used functions that the Splunk timechart command offers. And with all of its customizability: Selecting the fields to use as the analysis, the by field (for example, plotting response time by host name), the statistics (mean, max, 95th percentile, and so on), even splitting the colors so that the plot instantly shows the distribution of load across 10 hosts that reside within two data centers. This is an excellent (and free) book that shows what Splunk can do by way of clear examples: http://www.splunk.com/goto/book Again, I don't suggest that Kibana duplicate this. But I strongly suggest that Kibana gives me a way to insert my own commands into the processing so that I can implement the specific functions that our group requires, and can do it without my gorpy Perl script and copy-paste command mumbo-jumbo, and instead in a much more friendly and accessible way that even the PMs can run from their Windows laptops without touching the command line. And as my part of the bargain, I will use Perl, R, or whatever else is at my disposal to create custom commands that can run on the Kibana host and perform all of the analysis that our group needs. Brian On Wednesday, September 24, 2014 4:34:43 PM UTC-4, Ashit Kumar wrote: Brian, I like the direction you are going down and am trying to do that myself. However, being a perl fledgling, I am still battling Dumper etc. I would appreciate it if you could share your code to convert and ES query to CSV. I want to use aggregations and print/report/graph results. Kibana is very pretty and does the basics well, but I want to know who used web mail and order it by volume of data sent by hour of day and either graph / tabulate / csv out the result. I just cant see how to do that with Kibana. Thanks Ash -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b04ee873-23a6-40dd-a91b-7fa304634715%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to fix IndexMissingException
I recently ran into an issue where my cluster is reporting an IndexMissingException. I tried deleting the faulty index, but I keep getting the same error returned. How do I fix this problem? $ curl -XDELETE 'http://localhost:9200/logstash-2014.09.04.11' {error:IndexMissingException[[logstash-2014.09.04.11] missing],status:404} -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b6cd6fb-14b9-4775-9750-7352c4c1369e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using elasticSearch as repository for UDP published Ceilometer data through logstash and exception is being thrown 'invalid version format'
Bump... Anyone??? On Friday, August 29, 2014 11:28:01 AM UTC-4, Brian Callanan wrote: Hi, Need a little help. I'm Using Openstack Ceilometer and I've configured it to push metered data over UDP to a host:port. I installed logstash and configured it to receive the the UDP data from Ceilometer using the codec: msgpack. This works great! Really! Now I'm trying to Stuff the data on output into ElasticSearch and its getting an exception when pushing data into elasticsearch. Pushed data throws the following from elastic search: [2014-08-29 11:05:08,646][WARN ][http.netty ] [Amphibian] Caught exception while handling client http traffic, closing connection [id: 0x7d45e4d7, /127.0.0.1:53745 = /127.0.0.1:9200] java.lang.IllegalArgumentException: invalid version format: LOGSTASH-LINUX-CAL-13046-2010L9O160SXTFILI-RJ6DDVLG LINUX-CAL 10.2.3.23 at org.elasticsearch.common.netty.handler.codec.http.HttpVersion.init(HttpVersion.java:102) at org.elasticsearch.common.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62) at org.elasticsearch.common.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:75) at org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:189) at org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:101) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) ... *Can anyone shed any light on why the exception is being thrown?* My elastic search version: brian.callanan@linux-cal 143 % ./elasticsearch -v Version: 1.3.2, Build: dee175d/2014-08-13T14:29:30Z, JVM: 1.7.0_40 My Logstash version brian.callanan@linux-cal 159 % logstash -V logstash 1.4.2 My logstash conf input { udp { codec = msgpack # codec (optional), default: plain port = 40001 # number (required) type = ceilometer # string (optional) } } output { elasticsearch { host = localhost port = 9200 codec = json } stdout { codec = rubydebug } } A sample data: { counter_name = network.incoming.bytes.rate, resource_id = instance-0017-bec82aeb-b06a-4569-8b91-fcb6acd491e0-tap06349b1b-2d, timestamp = 2014-08-29T13:49:12Z, counter_volume = 8285., user_id = cbf803c4aeb6415eb492c04ed8debe2c, message_signature = e96ade5e06e1ec903e459f4c8a383413d1058bda0c1f7546dea62800e5f289f8, resource_metadata = { name = tap06349b1b-2d, parameters = {}, fref = nil, instance_id = bec82aeb-b06a-4569-8b91-fcb6acd491e0, instance_type = 3422a1d6-d61c-4577-9d38-47e1b25e8ad3, mac = fa:16:3e:a5:82:09 }, source = openstack, counter_unit = B/s, project_id = e7a434ef0aa549c9824d963029a02454, message_id = 4210ce68-2f83-11e4-9f59-f01fafe5cc22, counter_type = gauge, @version = 1, @timestamp = 2014-08-29T13:49:12.410Z, tags = [], type = ceilometer, host = 10.2.24.7 } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3970e4c3-d7c1-4056-b5bc-636473558d5b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Multi-field collapsing
I have a use case which requires collapsing on multiple fields. As a simple example assume I have some movie documents indexed with the fields: Director, Actor, Title Release Date. I want to be able to collapse on Director and Actor, getting the most recent movie (as indicated by Release Date). I think the new top hits aggregation almost gets me mostly what I need. I can create a terms aggregation on Director, with a sub terms aggregation on Actor, and add a top hits aggregation to that (size 1). Would this be the proper approach? By traversing over the aggregations I can get all of the hits that I want - however I can't (have elasticsearch) sort or page them. It's almost like I'd need a hitCollector aggregation which would collect all search hits generated by it's sub aggregations and allow me to specify sort and paging information at that level. Thoughts? Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/318b7474-004f-4244-90e8-d9b93639481f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana server-size integration with R, Perl, and other tools
Is there some existing method to integrate processing between the Kibana/ Elasticsearch response JSON and the graphing? For example, I have a Perl script that can convert an Elasticsearch JSON response into a CSV, even reversing the response to put the oldest event first (for gnuplot compatibility). I then have an R script that can accept a CSV and perform custom statistical analysis from it. It can even auto-detect the timestamp and ordering and reverse the CSV events (adapting without change to either an Elasticsearch response as CSV, or a direct CSV export from Splunk). I've showed the process to a few people, but all balk outright or else shy away politely at the thought of going to Kibana's Info button, copying and pasting the curl-based query, and then running it along with the Perl CSV conversion script and R processing script from the command line. And I can't blame them! It may be that Kibana already has the capability to pipe data through server-installed commands and scripts, but my lack of Javascript experience and lack of Kibana internals expertise doesn't seem to help me discover it. Or perhaps this would be a great new addition to Kibana: 1. Allow a server-side command to be in the middle of the response and the charting. 2. Deliver the response as a CSV with headers, including the @timestamp field of course, to the server-side command, along with the appropriate arguments and options for the particular panel. 3. Document the graphite / graphviz / other format required to display the plots. Just a thought. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/132cfc20-ea67-42c8-a518-48404593d35d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Terms Filter Assistance
We have 2 indices (logs intel) and are trying to search 2 fields in the logs index (src dst) for any match from the intel ip field. The challenge is the terms filter is expecting 1 document with all the values to be searched for within that document. The intel index has over 150k documents. Is there a way to extract the ip field from the intel index (aggregations maybe) and use that to search the src dst fields in the logs index? Here is the code I am trying to use: curl -XGET localhost:9200/logs/_search -d '{ query : { filtered : { filter : { terms : { src : { index : intel, type : ipaddress, id : *, path : ip }, dst : { index : intel, type : ipaddress, id : *, path : ip }, } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2d9d8c9-4747-4cb6-badc-4752345544dc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: bulk thread pool rejections
I found out that the rejections on ES are retried by logstash after a short relay. Increasing the queue by too much costs more memory in ES, which takes away from merges, searches, etc.. I increased threadpool.bulk.queue_size from 50 to 100, I see no lost messages due to the rejections. From: Robert Gardam robert.gar...@fyber.commailto:robert.gar...@fyber.com Reply-To: elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com Date: Thursday, August 14, 2014 at 5:55 AM To: elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com Subject: Re: bulk thread pool rejections Did you resolve this issue? I was seeing the exact thing in my setup. I also have my bulk messages set to 5k in logstash. Originally I had set the thread pool to unlimited but this apparently causes some strange issues with stability. On Tuesday, April 8, 2014 5:00:32 PM UTC+2, shift wrote: I tried lowering the logstash threads, but I am unable to keep up with the incoming message rate. It is important that I index messages in real time, but equally important that I am not losing messages. :) To keep indexing real time I need 200 logstash output threads with a flush size of 5000 sending bulk messages to each node in the elasticsearch cluster, but I am concerned that I am losing messages with these rejections. I increased the queue size to 500, I will see if this helps. On Wednesday, April 2, 2014 11:34:43 AM UTC-4, Drew Raines wrote: shift wrote: I am seeing a high number of rejections for the bulk thread pool on a 32 core system. Should I leave the thread pool size fixed to the # of cores and the default queue size at 50? Are these rejections re-processed? From my clients sending bulk documents (logstash), do I need to limit the number of connections to 32? I currently have 200 output threads to each elasticsearch node. The rejections are telling you that ES's bulk thread pool is busy and it can't enqueue any more to wait for an open thread. They aren't retried. The exception your client gets is the final word for that request. Lower your logstash threads to 16 or 32, monitor rejections, and gradually raise. You could also increase the queue size, but keep in mind that's only useful to handle spikes. You probably don't want to keep thousands around waiting since they take resources. Drew bulk : { threads : 32, * queue : 50,* active : 32, * rejected : 12592108,* largest : 32, completed : 584407554 } Thanks! Any feedback is appreciated. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/6oNFDzWZv98/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.commailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b84ceec3-145c-4129-9691-af1ad791aa57%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b84ceec3-145c-4129-9691-af1ad791aa57%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0FCFDBE5A17E804A9FCD5500B8E1C53FB224D4E7%40atl1ex10mbx2.corp.etradegrp.com. For more options, visit https://groups.google.com/d/optout.
Re: Some observations with Curator
Aaron, Well, now I feel a little foolish. Perhaps it was from my initial attempt to put --logfile at the end of the command instead of before the action: $ curator delete --older-than 8 --logfile /tmp/curator.log usage: curator [-h] [-v] [--host HOST] [--url_prefix URL_PREFIX] [--port PORT] [--ssl] [--auth AUTH] [-t TIMEOUT] [--master-only] [-n] [-D] [--loglevel LOG_LEVEL] [-l LOG_FILE] [--logformat LOGFORMAT] {show,allocation,alias,snapshot,close,bloom,optimize,delete} ... curator: error: unrecognized arguments: --logfile /tmp/curator.log So I changed it to -l before I moved it, based on the error message above. But you're correct: It does accept both forms of the option: # For testing: Works fine and stores the log in /tmp/curator.log $ curator --logfile /tmp/curator.log delete --older-than 8 # Older CentOS server; it's 2.7.5 on my MacBook (Mavericks) and # HP laptop (Ubuntu 14.04 LTS): $ python --version Python 2.6.6 # Latest released version: $ curator --version curator 1.2.2 Brian On Tuesday, August 5, 2014 8:18:24 PM UTC-4, Aaron Mildenstein wrote: Hmm. What version of python are you using? I am able to use --logfile or -l interchangeably. I'm glad you like Curator, and I like KELTIC :) Nice acronym. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/764826ca-3da6-419e-807a-f940cd86a8a6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transport client? really?
Here is my experience. Yours may vary. I also use the TransportClient. And then I wrap our business rules behind another server that offers an HTTP REST API but talks to Elasticsearch on the back end via the TransportClient. This server uses Netty and the LMAX Disruptor to provide low-resource high-throughput processing; it is somewhat like Node.js but in Java instead of JavaScript. Then I have a bevy of command-line maintenance and test tools that also use the TransportClient. I wrap them inside a shell script (for example, Foobar.main is wrapped inside foobar.sh) and convert command-line options (such as -t person) into Java properties (such as TypeName=person), and also set the classpath to all of the Elasticsearch jars plus all of mine. Whenever there is a compelling change to Elasticsearch, I upgrade, and many times I have watched my Java builds fail with all of the breaking changes. But even with the worst of the breaking changes, it was down for maybe a day or two at the most; the API is rather clean, and this newsgroup is a life saver, and so I never got stuck. And when I was done, I had learned even more about the ES Java API. So it's either a huge pain or it's the joy of learning, depending on your point of view. I have always viewed it as the joy of learning. I just wish the Facets-to-Aggregations migration was smoother. But I sense that there will be another breaking change on my horizon. This will be particularly sad for me, as I had implemented a rather nice hierarchical term frequency combining mvel and facets. Which are now deprecated and on the list to be removed. But again, I'll learn a lot when making the migration. I believe it was Thomas Edison who said that most people miss opportunities because the opportunities come dressed in overalls and look like work. But I digress :-) Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/40a95f8f-e616-4086-837e-071539078fd4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Some observations with Curator
Using the most recent release (1.2.2) of Curator, I noticed that the documentation says --logfile while curator itself rejects --logfile anywhere and requires -l in front of the other options to direct its log entries. No big deal; I just tested it until it worked before adding it to the cron job. And it is working superbly. We will be standing up several ELK instances in various QA data centers to analyze several independent product load tests. These ELK instances are also independent, as we do not wish to flood the logstash data across any of our inter-data-center VPN / router connections. And because they are independent, our operations folks are leery of manually keeping track of multiple instances of the ELK stack with which they have no familiarity. And so, Elasticsearch Curator is becoming an integral part of the automation of the ELK stack for us, as it helps to keep our hard-working operations folks from overload. We wish for ELK to be an asset and not an added drain on time and effort, and Curator is a vital part of that goal. To the point where I no longer think of it as simply the ELK stack, but rather the KELTIC stack: *Kibana, Elasticsearch, Logstash, Time-based Indices, Curator*. But whether ELK or KELTIC, the stack is awesome! Many thanks to all who contributed and who continue to drive it forward! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39d8300d-27fc-42da-b10b-3bb8280573d4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Python version for curator
An update: I have installed curator 1.2.2 by downloading the zip archive, unpacking it, and then installing it directly: $ cd curator-1.2.2 $ sudo python setup.py install Not sure if it's the fix since the previous version of curator, or else the pip-less install. But either way, it's working fine just as expected. And it works superbly! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fb50c60-56cb-47f5-b284-f723ba10e93f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Failure to execute ttl purge
What version of Elasticsearch? Of Java? How is TTL being used? For example, one extreme is to constantly add log data and then delete old data. This case is, of course, best handled with time-based indices and a tool such as curator to delete old data by index and not by individual document via TTL. I have run some test cases with TTL using Elasticsearch 1.3.0 and Java 7u60. I set a _ttl value of 5m and hammered Elasticsearch. After some time passed and several million documents had been added, I shut down the test and watched the TTL processing clean up. It took some time but it always succeeded. It was rather nice to see that my TTL tests were self-cleaning: I always ended up with an empty index after each run. This discussion may also shed a bit of light: http://elasticsearch-users.115913.n3.nabble.com/TTL-Load-Problems-td4024001.html Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8544b414-5017-465e-bdd2-665e0e52db7d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Node Client with bulk request indefinitely blocked thread when ClusterBlockException is being thrown
Alex, By the way, is this bug seen with the TransportClient also, or just the NodeClient? Thanks! Brian On Monday, August 4, 2014 4:27:35 AM UTC-4, Alexander Reelsen wrote: Hey, Just a remote guess without knowing more: On your client side, the exception is wrapped, so you need to unwrap it first. --Alex On Wed, Jul 23, 2014 at 9:47 AM, Cosmin-Radu Vasii cosminra...@gmail.com javascript: wrote: I am using the dataless NodeClient to connect to my cluster (version is 1.1.1). Everything is working ok, except when failures occur. The scenario is the following: -I have an application java based which connects to ES Cluster (application is started and the cluster is up and running) -I shutdown the cluster -I try to send a bulk request -The following exception is displayed in the logs, which is normal. But my call never catches the exception: Exception in thread elasticsearch[Lasher][generic][T#6] org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILA BLE/2/no master]; at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138) at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128) at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197) at org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65) at org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143) at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:117) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) My code is something like this BulkResponse response; try { response = requestBuilder.execute().actionGet(); } catch(NoNodeAvailableException ex){ LOGGER.error(Cannot connect to ES Cluster: + ex.getMessage()); throw ex; } catch (ClusterBlockException ex){ LOGGER.error(Cannot connect to ES Cluster: + ex.getMessage()); throw ex; } catch (Exception ex) { LOGGER.error(Exception in processing indexing request by ES server. + ex.getMessage()); } When I use a single request everything is ok. I also noticed a TODO in the ES code in the TransportBulkAction.java private void executeBulk(final BulkRequest bulkRequest, final long startTime, final ActionListenerBulkResponse listener, final AtomicArrayBulkItemResponse responses ) { ClusterState clusterState = clusterService.state(); // TODO use timeout to wait here if its blocked... clusterState.blocks().globalBlockedRaiseException(ClusterBlockLevel.WRITE); } Is this a known situation or a known bug or I am missing something? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/109057dc-70c4-471a-bd6d-8b8e72c37ff6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/109057dc-70c4-471a-bd6d-8b8e72c37ff6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/02334b56-2853-4105-bfab-3566d20a721c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Creating elasticsearch index mandatory?
By default, Elasticsearch automatically creates an index if a document is being added and the index doesn't already exist. Logstash automatically specifies a time-based index with day precision for each log entry. In other words: logstash-2014.07.28 logstash-2014.07.29 logstash-2014.07.30 logstash-2014.07.31 logstash-2014.08.01 logstash-2014.08.02 logstash-2014.08.03 logstash-2014.08.04 And Kibana's time picker automatically assumes the logstash defaults, so you should be good to go. One thing that initially tripped me up, and might trip you up: When I first ran Kibana I didn't see any of my data. But that's because I had loaded some test data into it, and the default time picker only went back a few minutes into the past. Brian On Monday, August 4, 2014 4:03:05 PM UTC-4, Acche Din wrote: Hello All, I have a ELK setup 'out of the box' . My goal is to parse apache logs via logstash and display it in kibana. I would like to know if it is mandatory to create an index on elasticsearch so as to store the result from apache logs(I have logstash.conf output=elasticsearch) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3abf0a58-7713-4e06-a272-e5d579ea4281%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: SIREn plugin for nested documents
Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of each) causes this web page to blank and redisplay continually. Can't read it; hope you can. In a previous life, I created a search engine that handled parent/child relationships with blindingly fast performance. One trick was that the index didn't just contain the document ID, but it contained the entire hierarchy of IDs. So, for example (and brevity, the IDs are single letters): Document ID and relationship Fully qualified and indexed ID --- -- A A B A.B C A.B.C D A.D E A.D.E F A.D.F So for example, it was nearly instantaneous to determine that, just by looking at and comparing the fully qualified IDs: A and F are in the same parent-child hierarchy, with F being a child of D and a grandchild of A. E and F are siblings under the same parent. And so on. Not sure how this would mesh with Lucene though. But complex parent-child relationships could be intersected just by the fully qualified IDs that came out of the inverted index. Documents did not need to be fetched or cached to perform this operation, and the result was breathtakingly blindingly fast performance. Just FYI. I can discuss off-line if anyone wishes. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b6ef1ce-3daf-4de5-b106-710fd306863d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Can one do a singular/plural phrase match with the Query String Query?
Can one perform the following query using wildcards ( instead of two distinct phrases ) when using a Query String Query? photographic film OR photographic films These do not seem to work, and return the same number of results as just photographic film: photographic film? photographic film* Can wildcards not be placed inside Exact Phrase queries? Is there a way to mimic this? My goal is to be able to perform queries like this: photo* film? ... capturing: photo film photo films photographic films photography films etc... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66cc151f-a235-40d4-a125-2236aae0f9bf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Specifying a Phrase within a Proximity Phrase Search?
I'm using the a Query String Query to perform a Proximity Search. I'm wondering if ( and if yes how ) I can nest a phrase within the overall phrase: wood glue manufacturer~5 ( where wood glue would be kept as a phrase ) My users have access to a Query String Query box and I'm exploring more advanced search capability through this box ... so performing the equivalent with other Query types is not helpful here ... for instance, I know that I can use a Span Near Query to accomplish this. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e77f989c-81d9-4ccd-9723-e82f371faf20%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [ANN] Log4j2 Elasticsearch appender
Awesome! I had been wondering to myself about this for a while. Brian On Friday, July 18, 2014 4:08:14 AM UTC-4, Jörg Prante wrote: Hi, I released a Log4j2 Elasticsearch appender https://github.com/jprante/log4j2-elasticsearch in the hope it is useful. Best, Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cff91677-fe6d-48af-85df-dcd09880c44d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Dropped HTTP Connections when Indexing
I'm trying to scale my indexing for the first time, and I'm running into connections problems. I reach a scale where cURL connections from my indexers start getting cURL7 errors ( connect failed ). It looks like ES just stops accepting all HTTP connections for a period of time. I cannot find the root cause. I'm running on an Amazon C3.4XL. The processors are not maxed, memory is not maxed, IO is not showing issues. I'm not seeing problems in the ES log, but I'm not sure I have logging fully enabled. I've tried increasing the thread_pool for the indexer, and that doesn't help. I'm not seeing any rejected connections there. I'm at a loss. The closest I can get is a guess using data from Bigdesk. When the number HTTP channels starts exceeding the number of transport channels, I start to see the problem emerge. I have no idea if this is related, but it's the only metric I've traced that seems correlated. Thoughts? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0fd75b96-3c00-47f1-9a59-3c9707f38734%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Updating Datatype in Elasticsearch
Within my configuration directory's templates/automap.json file is the following template. Elasticsearch uses this template whenever it generates a new logstash index each day: { automap : { template : logstash-*, settings : { index.mapping.ignore_malformed : true }, mappings : { _default_ : { numeric_detection : true, _all : { enabled : false }, properties : { message : { type : string }, host : { type : string }, UUID : { type : string, index : not_analyzed }, logdate : { type : string, index : no } } } } } } Note: 1. How to ignore malformed data (for example, a numeric field that contains no-data every once in a while). 2. How to automatically detect numeric fields. Logstash makes every JSON value a string. Elasticsearch automatically detects dates, but must be explicitly configured to automatically detect numeric fields. 3. Listing fields that must be considered to be strings even if they contain numeric values, or must not be analyzed, or must not be indexed at all. 4. Disabling of the _all field: As long as your logstash configuration leaves the message field pretty much intact, disabling the _all field will reduce disk space, increase performance, while still keeping all search functionality. But then, don't forget to also update your Elasticsearch configuration to specify message as the default field. Hope this helps! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e6e95468-3e21-4dc7-82eb-129a58c85852%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Python version for curator
No joy: $ *pip install elasticsearch* Requirement already satisfied (use --upgrade to upgrade): elasticsearch in /usr/lib/python2.6/site-packages Cleaning up... $ *curator --help* Traceback (most recent call last): File /usr/bin/curator, line 5, in module from pkg_resources import load_entry_point File /usr/lib/python2.6/site-packages/pkg_resources.py, line 2655, in module working_set.require(__requires__) File /usr/lib/python2.6/site-packages/pkg_resources.py, line 648, in require needed = self.resolve(parse_requirements(requirements)) File /usr/lib/python2.6/site-packages/pkg_resources.py, line 546, in resolve raise DistributionNotFound(req) pkg_resources.DistributionNotFound: elasticsearch=1.0.0,2.0.0 $ *uname -a* Linux elktest 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/04eeb676-9b57-46ac-9b7b-fc1b45824d79%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Python version for curator
A quick question: Is Python 2 acceptable for use with curator, or is Python 3 required? Thanks! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c21d8db3-1b39-417e-bf36-1e1bbcaae03b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Python version for curator
To continue, I installed curator on a Python 2.6.6 system thusly: pip install elasticsearch-curator And Elasticsearch 1.2.1 is installed on the same server. But when running curator --help, I see: *$ curator --help* Traceback (most recent call last): File /usr/bin/curator, line 5, in module from pkg_resources import load_entry_point File /usr/lib/python2.6/site-packages/pkg_resources.py, line 2655, in module working_set.require(__requires__) File /usr/lib/python2.6/site-packages/pkg_resources.py, line 648, in require needed = self.resolve(parse_requirements(requirements)) File /usr/lib/python2.6/site-packages/pkg_resources.py, line 546, in resolve raise DistributionNotFound(req) pkg_resources.DistributionNotFound: elasticsearch=1.0.0,2.0.0 This was per the information found at: https://github.com/elasticsearch/curator I'm not a Python dev (yet, anyway) but I don't believe I left anything out that was explicitly mentioned on the curator github page. Brian On Monday, July 14, 2014 3:00:27 PM UTC-4, Brian wrote: A quick question: Is Python 2 acceptable for use with curator, or is Python 3 required? Thanks! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c53b1a5-71bb-49b2-b843-518fc825%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Sorting on Parent/Child attributes
Hello, I'm looking for a solution to a problem I am having. Lets say I have 2 types Person and Pet in an index called customers. Person -account -firstname -lastname -SSN Pet -name -type -id -account I would like to query/filter on fields in both person and pet in order to retrieve people and their associated pet. Additionally, I need to sort on a field that could be in either person or pet. For example, retrieve all people/pets that have wildcard person.firstname '*ave*' and pet.type wildcard '*terrier*' and sort on pet.name. Or wildcard search on person.SSN = '*55*' and pet.name='*mister*' and sort on person.lastname. I currently have a solution where I search/sort on people or pet based on the sort that I am using. I use a hasChild/hasParent to manage the fields that are on the 'other' type. Then I use an id field to retrieve the entities of the other type. So, if I have a sort on personfirstname, I query on person and child (pet) and sort on person.firstname, then use the accounts to retrieve the pets (by account) in another query. This is not ideal because it is ugly and I suspect difficult to maintain if this query's requirements change in the future. I suspect that I can do a query at the 'customers' level and do 'type' queries on the fields that I need for person and pet. Similar to this: http://joelabrahamsson.com/grouping-in-elasticsearch-using-child-documents/ However, I'm not sure how I would implement the sort. I suspect that I could use a custom scoring script, but I am not sure how I would score text fields. Any thoughts? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53e9d304-d1e3-4994-b95f-3e5f0052ec96%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Setting id of document with elasticsearch-hadoop that is not in source document
I was just curious if there was a way of doing this without doing this, I can add the field if necessary. For alternatives, what if in addition to es.mapping.id, there is another property available also, like es.mapping.id.include.in.src where you could specify whether the src field actually gets included in the source document. In elasticsearch, you can create and update documents without having to include the id in the source document, so I think it would make sense to be able to do that with elasticsearch-hadoop also. On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote: You need to specify the id of the document you want to update somehow. Since in es-hadoop things are batch focused, each doc needs its own id specified somehow hence the use of 'es.mapping.id' to indicate its value. Is there a reason why this approach does not work for you - any alternatives that you thought of? Cheers, On 7/7/14 10:48 PM, Brian Thomas wrote: I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source document does not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id to update without having the add a new field to the MapWritable object? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript: mailto: elasticsearch+unsubscr...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77259ed3-a896-47cc-9304-cc32046756ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Setting id of document with elasticsearch-hadoop that is not in source document
I was just curious if there was a way of doing this without doing this, I can add the field if necessary. For alternatives, what if in addition to es.mapping.id, there is another property available also, like es.mapping.id.exlude that will not include the id field in the source document. In elasticsearch, you can create and update documents without having to include the id in the source document, so I think it would make sense to be able to do that with elasticsearch-hadoop also. On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote: You need to specify the id of the document you want to update somehow. Since in es-hadoop things are batch focused, each doc needs its own id specified somehow hence the use of 'es.mapping.id' to indicate its value. Is there a reason why this approach does not work for you - any alternatives that you thought of? Cheers, On 7/7/14 10:48 PM, Brian Thomas wrote: I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source document does not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id to update without having the add a new field to the MapWritable object? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript: mailto: elasticsearch+unsubscr...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark
Here is the gradle build I was using originally: apply plugin: 'java' apply plugin: 'eclipse' sourceCompatibility = 1.7 version = '0.0.1' group = 'com.spark.testing' repositories { mavenCentral() } dependencies { compile 'org.apache.spark:spark-core_2.10:1.0.0' compile 'edu.stanford.nlp:stanford-corenlp:3.3.1' compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: '3.3.1', classifier:'models' compile files('lib/elasticsearch-hadoop-2.0.0.jar') testCompile 'junit:junit:4.+' testCompile group: com.github.tlrx, name: elasticsearch-test, version: 1.2.1 } When I ran dependencyInsight on jackson, I got the following output: C:\dev\workspace\SparkProjectgradle dependencyInsight --dependency jackson-core :dependencyInsight com.fasterxml.jackson.core:jackson-core:2.3.0 \--- com.fasterxml.jackson.core:jackson-databind:2.3.0 +--- org.json4s:json4s-jackson_2.10:3.2.6 |\--- org.apache.spark:spark-core_2.10:1.0.0 | \--- compile \--- com.codahale.metrics:metrics-json:3.0.0 \--- org.apache.spark:spark-core_2.10:1.0.0 (*) org.codehaus.jackson:jackson-core-asl:1.0.1 \--- org.codehaus.jackson:jackson-mapper-asl:1.0.1 \--- org.apache.hadoop:hadoop-core:1.0.4 \--- org.apache.hadoop:hadoop-client:1.0.4 \--- org.apache.spark:spark-core_2.10:1.0.0 \--- compile Version 1.0.1 of jackson-core-asl does not have the field ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do. On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote: Hi, Glad to see you sorted out the problem. Out of curiosity what version of jackson were you using and what was pulling it in? Can you share you maven pom/gradle build? On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas brianjt...@gmail.com javascript: wrote: I figured it out, dependency issue in my classpath. Maven was pulling down a very old version of the jackson jar. I added the following line to my dependencies and the error went away: compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13' On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote: I am trying to test querying elasticsearch using Apache Spark using elasticsearch-hadoop. I am just trying to do a query to the elasticsearch server and return the count of results. Below is my test class using the Java API: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.serializer.KryoSerializer; import org.elasticsearch.hadoop.mr.EsInputFormat; import scala.Tuple2; public class ElasticsearchSparkQuery{ public static int query(String masterUrl, String elasticsearchHostPort) { SparkConf sparkConfig = new SparkConf().setAppName( ESQuery).setMaster(masterUrl); sparkConfig.set(spark.serializer, KryoSerializer.class.getName()); JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig); Configuration conf = new Configuration(); conf.setBoolean(mapred.map.tasks.speculative.execution, false); conf.setBoolean(mapred.reduce.tasks.speculative.execution, false); conf.set(es.nodes, elasticsearchHostPort); conf.set(es.resource, media/docs); conf.set(es.query, ?q=*); JavaPairRDDText, MapWritable esRDD = sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class); return (int) esRDD.count(); } } When I try to run this I get the following error: 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0] 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id... 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES at org.elasticsearch.hadoop.serialization.json. JacksonJsonParser.clinit(JacksonJsonParser.java:38) at org.elasticsearch.hadoop.serialization.ScrollReader. read(ScrollReader.java:75) at org.elasticsearch.hadoop.rest.RestRepository.scroll( RestRepository.java:267) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext( ScrollQuery.java:75) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next( EsInputFormat.java:319) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader. nextKeyValue(EsInputFormat.java:255) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext( NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext( InterruptibleIterator.scala:39) at org.apache.spark.util.Utils$.getIteratorSize
Setting id of document with elasticsearch-hadoop that is not in source document
I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source document does not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id to update without having the add a new field to the MapWritable object? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark
I figured it out, dependency issue in my classpath. Maven was pulling down a very old version of the jackson jar. I added the following line to my dependencies and the error went away: compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13' On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote: I am trying to test querying elasticsearch using Apache Spark using elasticsearch-hadoop. I am just trying to do a query to the elasticsearch server and return the count of results. Below is my test class using the Java API: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.serializer.KryoSerializer; import org.elasticsearch.hadoop.mr.EsInputFormat; import scala.Tuple2; public class ElasticsearchSparkQuery{ public static int query(String masterUrl, String elasticsearchHostPort) { SparkConf sparkConfig = new SparkConf().setAppName(ESQuery).setMaster(masterUrl); sparkConfig.set(spark.serializer, KryoSerializer.class.getName()); JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig); Configuration conf = new Configuration(); conf.setBoolean(mapred.map.tasks.speculative.execution, false); conf.setBoolean(mapred.reduce.tasks.speculative.execution, false); conf.set(es.nodes, elasticsearchHostPort); conf.set(es.resource, media/docs); conf.set(es.query, ?q=*); JavaPairRDDText, MapWritable esRDD = sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class); return (int) esRDD.count(); } } When I try to run this I get the following error: 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0] 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id... 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38) at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75) at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Has anyone run into this issue with the JacksonJsonParser? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c2b2f2e-5196-4a72-bfbc-4cd0fda9edf0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark
I am trying to test querying elasticsearch using Apache Spark using elasticsearch-hadoop. I am just trying to do a query to the elasticsearch server and return the count of results. Below is my test class using the Java API: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.serializer.KryoSerializer; import org.elasticsearch.hadoop.mr.EsInputFormat; import scala.Tuple2; public class ElasticsearchSparkQuery{ public static int query(String masterUrl, String elasticsearchHostPort) { SparkConf sparkConfig = new SparkConf().setAppName(ESQuery).setMaster(masterUrl); sparkConfig.set(spark.serializer, KryoSerializer.class.getName()); JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig); Configuration conf = new Configuration(); conf.setBoolean(mapred.map.tasks.speculative.execution, false); conf.setBoolean(mapred.reduce.tasks.speculative.execution, false); conf.set(es.nodes, elasticsearchHostPort); conf.set(es.resource, media/docs); conf.set(es.query, ?q=*); JavaPairRDDText, MapWritable esRDD = sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class); return (int) esRDD.count(); } } When I try to run this I get the following error: 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0] 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id... 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38) at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75) at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Has anyone run into this issue with the JacksonJsonParser? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9da5ae25-3e57-4c24-ab45-c62c987ebec0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)
Patrick, * Well, I did answer your question. But probably not from the direction you expected. hmm no, you didn't. My question was: it looks like I cant retrieve/display [_all fields] content. Any idea? and you replied with your logstash template where _all is disabled. I'm interested in disabling _all, but that was not my question at this point.* Fair enough. I don't know the inner details; I am just an enthusiastic end user. To the best of my knowledge, there is no content for the _all field; I view this as an Elasticsearch psuedo field whose name is _all and whose index terms are taken from all fields (by default), but still there is no actual content for it. And after I got into the habit of disabling the _all field, my hands-on exploration of its nuances have ended. It's time for the experts to explain! *Your answer to my second message, below, is informative and interesting but fails to answer my second question too. I simply asked whether I need to feed the complete modified mapping of my template or if I can just push the modified part (ie. the _all:{enabled: false} part). * Again, I have never done this, so I can only tell you what I do. I just cannot tell you all the nuances of what Elasticsearch is capable of. My recommendation is to try it. Elasticsearch is great at letting you experiment and then telling you clearly if your attempt succeeds or fails. So, try your scenario. If it fails, then it didn't work or you did something wrong. If it succeeds, then you can see exactly what Elasticsearch actually accepted as your mapping. For example: curl 'http://localhost:9200/logstash-2014.06.30/_mapping?pretty=true' echo This particular query looks at one of my logstash-generated indices, and it lets me verify that Elasticsearch and Logstash conspired to create the mappings I expected. I used this command quite a bit until I finally got everything configured correctly. (I actually verify the mapping via Elasticsearch Head, but under the covers it's the same command.) Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8eaefd0e-f684-4f44-9fcb-3137812a99d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana browser compatibility issues
Laura, The simplest way is to install Kibana as a site plug-in on the same node on which you run Elasticsearch. Not the best way from a performance and security perspective, but certainly the easiest way to start with an absolute minimum of extra levers to pull and knobs to turn, so to speak. So what does that really mean, a site plugin? Assume you configure Elasticsearch to look for plugins within the /opt/elk/plugins directory. Then you unpack the Kibana3 distribution within /opt/kibana3. That means you'll see the following files within /opt/kibana3/kibana-3.1.0: app build.txt config.js css favicon.ico font img index.html LICENSE.md README.md vendor So then create the /opt/elk/plugins/kibana3 directory. Then: $ ln -s /opt/kibana3/kibana-3.1.0 /opt/elk/plugins/kibana3/_site Now when you start ES and point it to the correct configuration file which in turn points it to the plugins directory as described above, Kibana will be available at the following URL (assuming you're on the same host; change localhost as needed, of course): http://localhost:9200/_plugin/kibana3/ Hope this helps! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59b1ac76-d3a5-4b63-bdc6-f617ef8c0627%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Copy index from production to development instance
Thank you for your suggestion. I tried the stream2es library but I get a OutOfMemoryError when trying to use that. On Friday, June 6, 2014 5:13:19 PM UTC-4, Antonio Augusto Santos wrote: Take a look at stream2es https://github.com/elasticsearch/stream2es On Friday, June 6, 2014 2:13:06 PM UTC-3, Brian Lamb wrote: I should also point out that I had to edit a file in the metadata-snapshot file to change around the s3 keys and bucket name to match what development was expecting. On Friday, June 6, 2014 1:11:57 PM UTC-4, Brian Lamb wrote: Hi all, I want to do a one time copy of the data on my production elastic search instance to my development elastic search instance. Both are managed by AWS if that makes this easier. Here is what I tried: On production: curl -XPUT 'http://localhost:9200/_snapshot/my_s3_repository' -d '{ type: s3, settings: { access_key: productionAccessKey, bucket: productionBucketName, region: region, secret_key: productionSecretKey } }' curl -XPUT http://localhost:9200/_snapshot/my_s3_repository/snapshot_2014_06_02; What this does is upload the instance to a production level s3 bucket. Then in the aws console, I copy all of it to a development level s3 bucket. Next on development: curl -XPUT 'http://localhost:9200/_snapshot/my_s3_repository' -d '{ type: s3, settings: { access_key: developmentAccessKey, bucket: developmentBucketName, region: region, secret_key: developmentSecretKey } }' curl -XPOST http://localhost:9200/_snapshot/my_s3_repository/snapshot_2014_06_02/_restore This gives me the following message: $ curl -XPOST http://localhost:9200/_snapshot/my_s3_repository/snapshot_2014_06_02/_restore?pretty=true { error : SnapshotException[[my_s3_repository:snapshot_2014_06_02] failed to get snapshots]; nested: IOException[Failed to get [snapshot-snapshot_2014_06_02]]; nested: AmazonS3Exception[Status Code: 404, AWS Service: Amazon S3, AWS Request ID: RequestId, AWS Error Code: NoSuchKey, AWS Error Message: The specified key does not exist.]; , status : 500 } Also, when I try to get the snapshots, I get the following: $ curl -XGET localhost:9200/_snapshot/_status?pretty=true { snapshots : [ ] } This leads me to believe that I am not connecting the snapshot correctly but I'm not sure what I am doing incorrectly. Regenerating the index on development is not really a possibility as it took a few months to generate the index the first time around. If there is a better way to do this, I'm all for it. Thanks, Brian Lamb -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0ed279b6-599f-4c90-917a-d377622e12cd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Returning Many Large Documents and Performace
I'm executing a query where I could possibly return 100k results. The documents are quite large, about 3.6 kb per document, 312 mb for 100k of these. When executing the query in ES, the query itself is somewhat fast, about 5 seconds. But it takes longer than a minute to get the results back from the server. What can be done to improve this performance? Is ElasticSearch not meant to handle such large documents? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/290879ce-64cf-497e-b8c0-b962646daeae%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Pagination: Determine Page Number Of A Record
I have a requirement where a document could be anywhere in a result set and I need to calculate a page number according where this document is in the results. I've been trying many different ideas such as using a script to calculate the page number based on the total count and a counter variable, but the counter keeps getting reset every time a shard is queried. I also tried returning the entire result set and calculating this value in .net, but ES takes too long to complete a query request for sizes of 8000 or more. I realize we shouldn't be returning these many features, but the scan and scroll is not an option because I will need to parse each response to see if the document I'm looking for is in it. From and size also wont work because I have no idea what the 'from' value will be, and that is the value I'm trying to calculate. I guess my question is, does any one have an idea of how to calculate a page number for a given document inside a query result? Perhaps there is some functionality in ES that will tell you a document with a certain ID is the n'th document in the entire result set? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c2458c1-629c-42d8-8a7e-551c6c093cda%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Proper parsing of String values like 1m, 1q HOUR etc.
Thomas, The TimeValue class handles precisely defined time periods (well, pretty much, anyway). In other words, 1s is one second. 1w is always 7d (leap seconds notwithstanding, but that doesn't really affect the precision). But what is one year? 365 days? 365.25 days? 366 days in a leap year? What is one quarter? Exactly 91.25d (which is 365 / 4)? Or 3 months? But then, what is a month? 28 days? 31 days? Use 28d or 31d if that's what you mean; 1 month has no deterministic meaning all by itself. And 1 quarter is 3 months but without any deterministic way to convert to a precise number of milliseconds. The TimeValue class has no support for locale nor day of year nor leap year nor days in a month. It's best to use Joda time if you wish to perform proper year-oriented calculations. And it will return milliseconds precision if you wish, which will plug directly back into a TimeValue. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05798106-2a3d-4b7a-8a06-572116e0694b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Filtering on Script Value
I'm trying to calculate a value for each hit then select or filter on a calculated value. Something like below: query: { match_all: {} }, script_fields : { counter : { script : count++, params : { count : 1 } }, source : { script : _source } } I'd like to filter on the count parameter. I've read on a StackOverflow post that you cannot filter on a script value. So is there another way to calculate some value dynamically and filter on that value? If not, is there a nested SQL SELECT equivalent in ElasticSearch? Maybe I could execute the first query to calculate the 'count' then execute another query to filter by a value? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8206ea84-b314-4b8e-8f3c-248d9f5a99e7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: script_fields vs filter script
I'm trying to filter on a calculated script field as well. Have you figured this out Kajal? On Tuesday, May 27, 2014 10:49:35 AM UTC-6, Kajal Patel wrote: Hey, Can you actually post your solution If you figured out. I am having similar issue, I need to filter search result based on script_field. I don't want to use filter_script though because I am using facets and I want my records to filter out for facets too. Do you know if can extends any class or any any plugin or anything to filter my records based on the script field. On Sunday, July 7, 2013 1:21:38 PM UTC-4, Oreno wrote: Hi Alex, 1.I checked the cash solution but its taking 15 times more then my starting time (10s against 150s), so that will be a problem since my filter has dynamic params. It does go fast once it's stored though. Do you know if it's possible to do some kind of cashing for all source documents for future queries? 2.From what I understand ,both the filter script and the script_field are suppose to go over each document that results from the prior query. The only thing I can think of that makes the difference is that the script_filter actually needs to filter the false documents (for the hit count) while the script_field only needs to add the field for the first 10 document returning by default. I'm trying to figure out how I can speed the response when using source() on native java script. I'm assuming the bottle neck is somewhere within creating the response. I read that using source has some overhead because elasticsearch has to parse the json source, but if that was the case here, then I should have received the same big overhead for both script_field and filter script runs. All I actually need is the hit count so if I'm correct about the response parsing and that can be excluded I'll be really glad. Any idea on the above? Appreciating your help. Oren On Sun, Jul 7, 2013 at 7:13 PM, Alexander Reelsen-2 [via ElasticSearch Users] [hidden email] http://user/SendEmail.jtp?type=nodenode=4037661i=0 wrote: Hey, what kind of query are you executing? Using script fields results in the scipt only being executed for each search hit, whereas executing it as a script filter it might need to execute for each document in your index (you can try to cache the script filter so it might be faster for subsequent requests). Hope this helps as a start for optimization, if not, please provide some more information. --Alex On Sun, Jul 7, 2013 at 2:21 PM, oreno [hidden email] http://user/SendEmail.jtp?type=nodenode=4037659i=0 wrote: Hi, I notice that using a script_fields that returns true or false values is going much faster then using the same script but with filter script declaration (so it will filter the docs returning false). I was sure that the filter script is taking so long because I'm using the source().get(...) method, but turns out that when using the same script, only with script_fields instead, I'm receiving the performance I need. the only problem here is that I want to filter the docs that now have MessageReverted = false. 1.Any way I can filter the docs containing MessageReverted = false ?(some wrapper query?) 2. Any idea way the filter script is taking much longer then the script field(8000 mill against 250 mill)? both ways are retrieving the source() for the script logic so it can't be a matter of source fetching as far as I understand. fast: ..., script_fields: { MessageReverted: { script: revert, lang: native, params: { startDate: 2013-05-1, endDate: 2013-05-1, attributeId: 2365443, segmentId: 2365443 } } } slow: ..., filter: { script: { script: revert, lang: native, params: { startDate: 2013-05-1, endDate: 2013-05-1, attributeId: 2365443, segmentId: 2365443 } } } Any idea? Thanks in advanced, Oren -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/script-fields-vs-filter-script-tp4037658.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email] http://user/SendEmail.jtp?type=nodenode=4037659i=1. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email] http://user/SendEmail.jtp?type=nodenode=4037659i=2. For more options, visit https://groups.google.com/groups/opt_out. -- If you reply to this email, your message will be
Re: [logstash-users] Re: Kibana dashboards - A community repository
Thanks, Mark! That really helps a lot. Starting with the excellent logstash book, this example https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/templates/logstash/indexer.conf.erb also helped quite a bit. It was referenced from here http://ci.openstack.org/logstash.html. Brian On Monday, June 23, 2014 6:51:56 AM UTC-4, Mark Walkom wrote: I'm definitely open to expanding this. I am thinking it might even grow to include LS configs (eg custom grok patterns), as they are an important part of the visuals. Regards, Mark Walkom -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5e92485a-706e-49f2-831f-8a8c2e9aaac7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Count not working for Java API, works for REST
Perhaps you need to insert the execute().actionGet() method calls, as below? CountRequestBuilder builder = client.prepareCount(indexName) .setTypes(product) .setQuery(getQuery(req)); *builder.execute().actionGet()* return builder.get().getCount(); I don't use Count, but I have used Query and Update and Delete and they all work similarly in this regard. Just a guess. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/efaf68b8-90f1-47c7-87c1-d124ae7c4bce%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: logstash CPU usage
Of the various logstash groups, the following is the one that I have found to be the most active and helpful: https://groups.google.com/forum/#!forum/logstash-users Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/582c-7e12-4ba4-9847-e9976313c924%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Getting complete value from ElasticSearch query
Vinay, To be more specific: If you don't ask for any fields, then _source is returned by default. But if you ask for any fields at all, then _source is not included by default. Therefore, if you wish to include _source along with other fields, you must explicitly ask for _source along with those other fields. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6907975-ce05-4a9a-88cc-cf234e0c1990%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Bulk API possible bug
Hi, Pablo. I remember reading that Elasticsearch will happily store an invalid JSON string as your _source. From my usage of the Java API, I noticed that the Jackson library is used, but that only the stream parser is present. What this tells me is that ES is likely parsing your JSON token-by-token and has processed and indexed most of it. In other words, an error isn't an all-or-nothing situation. Since your syntax error happens at the very end of the document, Elasticsearch has indexed all of the document before it encounters the error. My guess is that if the error was not at the very end of the document, then Elasticsearch would fail to process and index any information past the error, but would successfully process and index information (if any) before the error. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/042fcbfd-9575-4543-b6b1-2328af05b1fe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Proper parsing of String values like 1m, 1q HOUR etc.
1w means one week, 12.3d means 12.3 days, 52w means 52 weeks, 4h means 4 hours, 12.3ms means 12.3 milliseconds, 12 means 12 milliseconds but without the suffix the value must be an integer. In other words, TimeValue supports the parsing of a String that contains a long integer digit string to mean milliseconds, or an integer or floating point digit string with a suffix. So a WEEK is represented as 1w or 7d, and an HOUR is represented as 1h or 60m. So if you want to support your own vocabulary, then create a wrapper class that converts your own terms to TimeValue strings, then then passes them into the TimeValue class. Brian On Tuesday, June 17, 2014 11:31:37 AM UTC-4, Thomas wrote: Hi, I was wondering whether there is a proper Utility class to parse the given values and get the duration in milliseconds probably for values such as 1m (which means 1 minute) 1q (which means 1 quarter) etc. I have found that elasticsearch utilizes class TimeValue but it only parses up to week, and values such as WEEK, HOUR are not accepted. So is in elasticsearch source any utility class that does the job ? (for Histograms, ranges wherever is needed) Thank you Thomas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3078016f-fe47-468b-a36f-c19f2a5c607d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: issues with file input from logstash to elastic - please read
Thanks so much for the feedback, Ivan. One more question: We have two different forms of rotated files (on *IX systems; no Windows servers): 1. Standard log4j rotation: The XXX.log file is renamed to XXX-date.log and a new XXX.log file is created. The name doesn't change, but the inode changes. 2. When we switched many of our applications to use log4j2, we don't rotate the log files using log4j2. Instead, we have a cron job that, once per hour, makes a copy of the XXX.log file and then truncates the XXX.log file; in the background it compresses the copy. In this case, the name doesn't change, the inode doesn't change, but the size suddenly drops to 0 before it starts filling again from the beginning. The GNU tail -F command handles both of these equally perfectly. Does logstash also handle both of these cases? Thanks in advance! P.S. I am not a logstash expert either, but it's been a lot of fun to rediscover Elasticsearch from the ELK perspective (auto-mapping, auto-creation of indices, and so on). Brian On Saturday, June 21, 2014 10:42:37 AM UTC-4, Ivan Brusic wrote: The path shows an windows file name, so I am not sure if using tail would work. On cygwin, there is no -F option, at least on the version I use. On Linux, the file input works great, especially with rotated file. I am not a Logstash expert, but I use the file input with the sincedb option (sincedb_path) and it has worked since day one. -- Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f1433e1-748e-4a20-980f-5112a1f965fa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Splunk vs. Elastic search performance?
Thomas, Thanks for your insights and experiences. As I am someone who has explored and used ES for over a year but is relatively new to the ELK stack, your data points are extremely valuable. Let me offer some of my own views. Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query. During the year that I've used ES and watched this group, I have been on the front line of a brand new product with a smart and dedicated development team working steadily to improve the product. Six months ago, the ELK stack eluded me and reports weren't encouraging (with the sole exception of the Kibana web site's marketing pitch). But ES has come a long way since six months ago, and the ELK stack is much more closely integrated. The Splunk UI is carefully crafted to isolate users from each other and prevent external (to the Splunk db itself, not to our company) users from causing harm to data. But Kibana seems to be meant for a small cadre of trusted users. What if I write a dashboard with the same name as someone else's? Kibana doesn't even begin to discuss user isolation. But I am confident that it will. How can I tell Kibana to set the default Lucene query operator to AND instead of OR. Google is not my friend: I keep getting references to the Ruby versions of Kibana; that's ancient history by now. Kibana is cool and promising, but it has a long way to go for deployment to all of the folks in our company who currently have access to Splunk. Logstash has a nice book that's been very helpful, and logstash itself has been an excellent tool for prototyping. The book has been invaluable in helping me extract dates from log events and handling all of our different multiline events. But it still doesn't explain why the date filter needs a different array of matching strings to get the date that the grok filter has already matched and isolated. And recommendations to avoid the elasticsearch_http output and use elasticsearch (via the Node client) directly contradict the fact that logstash's 1.1.1 version of the ES client library is not compatible with the most recent 1.2.1 version of ES. And logstash is also a resource hog, so we eventually plan to replace it with Perl and Apache Flume (already in use) and pipe it into my Java bulk load tool (which is always kept up-to-date with the versions of ES we deploy!!). Because we send the data via Flume to our data warehouse, any losses in ES will be annoying but won't be catastrophic. And the front-end following of rotated log files will be done using the GNU *tail -F* command and option. This GNU tail command with its uppercase -F option follows rotated log files perfectly. I doubt that logstash can do the same, and we currently see that neither can Splunk (so we sporadically lose log events in Splunk too). So GNU tail -F piped into logstash with the stdin filter works perfectly in my evaluation setup and will likely form the first stage of any log forwarder we end up deploying, Brian On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk. On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data
Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)
Patrick, Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in null or no-data instead of the usual numeric value): { automap : { template : logstash-*, settings : { *index.mapping.ignore_malformed : true* }, mappings : { _default_ : { numeric_detection : true, *_all : { enabled : false },* properties : { message : { type : string }, host : { type : string }, UUID : { type : string, index : not_analyzed }, logdate : { type : string, index : no } } } } } } Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a145cb1e-4013-4a6b-a58d-9a42368d8107%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Splunk vs. Elastic search performance?
Mark, I've read one post (can't remember where) that the Node client was preferred, but have also read where the HTTP interface is minimal overhead. So yes, I am currently using logstash with the HTTP interface and it works fine. I also performed some experiments with clustering (not much, due to resource and time constraints) and used unicast discovery. Then I read someone who strongly recommended multicast recovery, and I started to feel like I'd gone down the wrong path. Then I watched the ELK webinar and heard that unicast discovery was preferred. I think it's not a big deal either way; it's what works best for your particular networking infrastructure. In addition, I was recently given this link: http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded me at all, but it is a thought-provoking read. I am a little confused by some things, though. In all of my high-performance banging on ES, even with my time-to-live test feature enabled, I never lost any documents at all. But I wasn't using auto-id; I was specifying my own unique ID. And when run in my 3-node cluster (slow due to being hosted by 3 VMs running on a dual-code machine), I still didn't lose any data. So I am not sure of the high data loss scenarios he describes in his missive; I have seen no evidence of any data loss due to false insert positives at all. Brian On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote: I wasn't aware that the elasticsearch_http output wasn't recommended? When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning. Regards, Mark Walkom -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: issues with file input from logstash to elastic - please read
Eitan, My recommendation is to use the stdin input in logstash and avoid its file input. Then, for testing you pipe the file into your logstash instance. But in production, you should run the GNU version of *tail -F* (uppercase F option) to correctly follow all forms of rotated logs, and the pipe that output into your logstash instance. I don't know just how robust logstash's file input is, but the GNU version of tail with the -F option is perfect, so there's no guesswork and no dependency on hope. Note that even Splunk has a currently open bug with losing data while trying to follow a rotated file. Also, I added the multiline processing to the filters; it didn't seem to work when applied as a stdin codec. Now it works very well together. Anyway, that's what our group is doing. And yes, the logstash-users https://groups.google.com/forum/#!forum/logstash-users group is also rather active and is a good place for logstash-specific help. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bbe59f4-93f1-4b59-8258-89301a8c5469%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Index template requires settings object even if its value is empty
By the way, I got a little ahead of myself in the previous post. In particular: settings : { index.mapping.ignore_malformed : true*,* *index.query.default_field : message* }, Apparently, when added the setting above in red, and then removed the following option from my ES 1.2.1 start-up script, Kibana was no longer able to search on HTTP and it required message:HTTP because the _all field has also been disabled: -Des.index.query.default_field=message So I put the configuration option (above) back into my ES start-up script, and removed the index configuration option in red above (as it didn't seem to work). Not sure if this is a problem with my understanding (most likely) or a bug in ES (very unlikely). But I offer it to the experts for comment and correction. But however it should be, ES rocks and I've managed to get several people up and running with a one-button (as it were) build, install, load, and test. Awesome job, Elasticsearch.com! You make me look good! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d68e3db5-e651-4e57-85b8-fea70a5e8de9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Index template requires settings object even if its value is empty
Alex, I am running ES version 1.2.1. It seemed to work (no errors in the logs), but I did it as an on-disk template and not via PUT. And without the settings, it behaved as if it wasn't there. The question is now moot, because I actually need the following setting: settings : { index.mapping.ignore_malformed : true, index.query.default_field : message }, I don't have a problem fiddling with local files; Elasticsearch, the wrapper script, and everything else I need is stored in a single zip archive that our operations team can easily install. So once I install it on my laptop and verify that it's working, it's 100.% repeatable when installed on any QA or production server. I also configure logstash's elasticsearch_http as follows: manage_template = false That way, I don't have to depend on logstash (or anything else) doing that for me. It's already done by the base ES install package. Brian On Monday, June 16, 2014 8:03:33 AM UTC-4, Alexander Reelsen wrote: Hey, which ES version are you using? Seems to work with the latest version. You can also use the index template API, so you do not have to fiddle with local files (and copy them when adding new nodes). PUT _template/automap { template: *, mappings: { _default_: { numeric_detection: true, properties: { message: { type: string }, host: { type: string }, @version: { type: string } } } } } --Alex On Tue, Jun 3, 2014 at 5:57 PM, Brian brian@gmail.com javascript: wrote: I am not sure if this is a problem or if it's OK. Working with the ELK stack I have switched direction, and instead of locking down the Elasticsearch mappings I am now using its automatic mapping functions. And by adding the following JSON template definition to the /*path.to.config*/templates/automap.json file I can get numeric fields automatically correctly mapped even though logstash always emits their values as strings (45.6 instead of 45.6). Very nice! { automap : { template : *, *settings : { },* mappings : { _default_ : { numeric_detection : true, properties : { message : {type : string}, host : {type : string}, @version : {type : string} } } } } } When I removed the *settings:{}* entirely, it was as if the template did not exist; the numeric detection was not enabled and all string values were seen as strings even if they contained numbers. Because all of the settings are being controlled within elasticsearch.yml and not the template (e.g. number of shards, number of replicas, and so on), eliminating the settings from the template is desired, even if I have to leave it in but set its value to the empty JSON object. If this is the way it's supposed to work, that's OK. But I couldn't find anything in the documentation about it, and just wanted to get a verification either way. Thanks! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff4afb8e-c3e4-4772-aa48-bd6a651c78e8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ff4afb8e-c3e4-4772-aa48-bd6a651c78e8%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0ffa60d5-92a1-462f-b335-de83907060eb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.