Re: IllegalStateException[field \DISPLAY_NAME\ was indexed without position data
Hi Ivan, Running the following query it returns records below : { query : { match : {DISPLAY_NAME : Happy People} } } Result : https://gist.github.com/cheehoo/073ab926baa123b18224 but running with span query suggested: { from : 100, size : 100, query : { span_first : { match : { span_near : { clauses : [ { span_term : { DISPLAY_NAME : happy } }, { span_term : { DISPLAY_NAME : people } } ], slop : 1, in_order : true } }, *end : 2* } } } no result returned. Any clues :) Thanks. On Wed, Apr 30, 2014 at 12:04 PM, Ivan Brusic i...@brusic.com wrote: Do you have any documents that starts with happy people? -- Ivan On Tue, Apr 29, 2014 at 7:21 PM, chee hoo lum cheeho...@gmail.com wrote: Hi Ivan, Tried with 2 and 3 with no luck. { from : 100, size : 100, query : { span_first : { match : { span_near : { clauses : [ { span_term : { DISPLAY_NAME : happy } }, { span_term : { DISPLAY_NAME : people } } ], slop : 1, in_order : true } }, *end : 2* } } } The field is using standard analyzer with stopword=_none: DISPLAY_NAME: { type: string, analyzer: standard }, index.analysis.analyzer.standard.type: standard, index.analysis.analyzer.standard.stopwords: _none_ Any clue on this ? :) Thanks On Wed, Apr 30, 2014 at 12:37 AM, Ivan Brusic i...@brusic.com wrote: The end parameter is too low. It needs to be at a minimum the number of clauses in the span_near query. -- Ivan On Mon, Apr 28, 2014 at 7:05 PM, chee hoo lum cheeho...@gmail.comwrote: Hi Ivan, Not able to get any result with the following query : { from : 100, size : 100, query : { span_first : { match : { span_near : { clauses : [ { span_term : { DISPLAY_NAME : happy } }, { span_term : { DISPLAY_NAME : people } } ], slop : 1, in_order : true } }, end : 1 } } } Meanwhile tried with : { from : 100, size : 100, query : { span_first : { match : { span_term : { DISPLAY_NAME : happy } }, end : 1 } } } and it returns : _index: jdbc_dev, _type: media, _id: 9556, _score: 4.612431, _source: { DISPLAY_NAME: Happy People, Anything wrong with my first query ? Thanks On Tue, Apr 29, 2014 at 12:16 AM, Ivan Brusic i...@brusic.com wrote: The main limitation of the span queries is that they only operate on analyzed terms. The terms used in span_term must match the terms in the index. In your case, there is no single term happy holiday in your index, because the original document was tokenized into happy birthday to you. You would need to do a span near query of the two terms with a slop of 1 and in order. This span near query will then be the argument to the span first. Here is a good explanation of span queries in Lucene: http://searchhub.org/2009/07/18/the-spanquery/ -- Ivan On Sun, Apr 27, 2014 at 11:24 PM, cyrilforce cheeho...@gmail.comwrote: Hi Ivan, I recreate the mapping and re-index the documents and now working fine. Thanks. Btw would like to ask how i could search two or more words in the span_first query as i need it to support the following searches : 1)happy 2)happy holiday 3)happy birthday to you { from : 100, size : 100, query : { span_first : { match : { * span_term : { DISPLAY_NAME : happy holiday }* }, end : 1 } } } returns empty list even we have documents that display_name start with *happy holiday*. Thanks. On Sunday, April 27, 2014 2:55:37 AM UTC+8, cyrilforce wrote: Hi Ivan, I am using version elasticsearch-0.90.1. Nope we don't have any templates. Not sure whether your are referring to the full index mapping here's the gist media mapping https://gist.github.com/cheehoo/11327970 full index mapping https://gist.github.com/cheehoo/11327996 Thanks in advance. On Sat, Apr 26, 2014 at 8:31 AM, Ivan Brusic i...@brusic.comwrote: Your mapping looks correct. Which version are you running? Do you have any templates? Just to be on the safe side, can you provide the mapping that Elasticsearch is using (not the one you provide): http://localhost:9200/jdbc_dev/media/_mapping -- Ivan On Fri, Apr 25, 2014 at 3:24 AM, cyrilforce cheeho...@gmail.comwrote: Hi, I am trying to query some records via the span_first query as below : { from : 100, size : 100, query : { span_first : { match : { * span_term : { DISPLAY_NAME : happy }* },
Truncating scores
Hello everybody, I am using the function_score query in order to compute a custom score for items I am indexing into ElasticSearch. I am using a native script (written in Java) in order to compute my score. This score is computed based on a date (Date.getTime()). When I use a logger and look what is returned by my native script, I get what I want, but when I look at the score of items returned by query (I use the replace mode), I get a truncated number (e.g. if a computed score displayed in the native script with the value 1 392 028 423 243, it is returned with the value 1 392 028 420 000 as score of returned items). The problem here is that I am loosing milliseconds and seconds (I only get the decade part of seconds). Loose milliseconds can be acceptable, but I can't loose seconds. Is this problem a limitation of ElasticSearch ? Is there any way to workaround this problem ? Thanks in advance for your replies. Regards, Loïc Wenkin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: performance issue with script scoring with fields having a large array
Hello, Using _source for scripts is typically slow, because ES has to go to each stored document and extract fields from there. A faster approach is to use something like doc['field3'].values[12], which will used the field data cache (already loaded in memory, at least after the first run): http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields More details about field data can be found here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.htm Best regards, Radu -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Apr 30, 2014 at 12:27 PM, NM n.maisonne...@gmail.com wrote: I have document having fields containing large array. I would like to score according to the value of a nth element of such array, but got very slow answer (5s) for only 10K document indexed. my mapping: document { id: value, field2: string, field3: [ int_1,int_2, ... , int_10k] - large array of 10K integers } assume I generated and indexed 10K documents with 1K random integer values in the field 'field3' I then use the following search query GET /test/document/_search { query:{ function_score:{ script_score : { script : _source.fields3[12] * _source.fields3[11] } = got 5000 ms however with basic Java object with a simple nested loop: - for all the documents score[i] = doc[i].fields[12] * doc[i].fields[11] - sort by score = got 50 ms ES is 100 slower than a simple loop.. How to get similar performance with ES? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2wmDJFBJvJ1fTUsszaP7GjVtJYfSU-AbHMq6NS%2BVqhFw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Deployment architecture
It will work, but if you want to maintain HA then it'd make sense to keep your inputs separate from your outputs. At least, that's my take :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 30 April 2014 19:48, Norberto Meijome num...@gmail.com wrote: Sending indexing requests to SLB - is this less optimal, or would outright fail? On 30/04/2014 9:04 am, Mark Walkom ma...@campaignmonitor.com wrote: For searches, yes. You'd want the indexing to go to the masters. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 30 April 2014 09:02, Norberto Meijome num...@gmail.com wrote: On a related note, if you have separate slb and master, your main LB (say, haproxy) would be pointing to the slb , not the master , right? On 29/04/2014 8:40 pm, Dinesh Chandra shadow.on.f...@gmail.com wrote: Hi, I am very new to elasticsearch, I am trying to deploy elasticsearch in my dev environment - While there are many ways in which Elasticsearch can be deployed, I and my team have arrived at this architecture 4 Data Nodes 3 Master Nodes 2 Search Load Balancers (SLB) Now my question is: - Does it make sense to have SLB at all? - Can I just have master nodes and have them perform the JOB of SLB too? Please enlighten me on a sensible Elasticsearch Architecture! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82ee8ae2-c84d-4685-b061-d3e433b7969f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/82ee8ae2-c84d-4685-b061-d3e433b7969f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4K9mh%3D%3Dv02mkRForLfHO8E4MYUcd3kNvfvFJGWvRwFiCg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CACj2-4K9mh%3D%3Dv02mkRForLfHO8E4MYUcd3kNvfvFJGWvRwFiCg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bQXpsN12dCPQefkkL8LMX0bdsGVrs2uS0ZRLMtqRM%3DXg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bQXpsN12dCPQefkkL8LMX0bdsGVrs2uS0ZRLMtqRM%3DXg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4JqQ3Q%3DTKaTWbZTEkbFBW%2Bj6acGeFiBo7omUH-6aEo1Lg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CACj2-4JqQ3Q%3DTKaTWbZTEkbFBW%2Bj6acGeFiBo7omUH-6aEo1Lg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y9ZS7FAVa1R%3D-sK4UWyJj6uoSC756fe%3Dii7Xi2e7Kn0A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: index binary files
Hello, Normally, you would send indexing requests to the REST API with the stuff you want Elasticsearch to index: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html If you want Elasticsearch to automatically fetch files from the file system for you, have a look at David's FileSystem River: https://github.com/dadoonet/fsriver Best regards, Radu -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Apr 29, 2014 at 6:40 PM, anass benjelloun anass@gmail.comwrote: hello, I installed ElasticSearch, its work good i can index and search xml and json content using Dev HTTP Client. I need your help to index binary files in elasticsearch then search for them by content. I added mapper-attachements to elastic search but what i dont know is how to specify the folder of pdf or docx files to index it. something like base64 or i dont know. Thanks for helping me. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/787f6815-408a-4ef7-bfd3-a5ee6cc02798%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/787f6815-408a-4ef7-bfd3-a5ee6cc02798%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2UQpB63eye_Yii0KiGYXiMj8Q6v3swRrxxYNk5jiMxpQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Specify metadata per word/term in a string
Hi Given a text,say hello elastic search world, is there a way i can associate a field or some metadata per word in the text on which i can later query? for eg: give code number to each word, and should be able to search like text = hello AND code = 25 i.e return all hello words which have 25 in their code metadata. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Specify metadata per word/term in a string
Hello Neeraj , First of all you cant return a hello from Elasticsearch. Elasticsearch works on feed level basis. Which means if you want to search hello , you will get the feed with the text hello elasticsearch search world but not just hello. Only way I can think of create a different document for each word. So a document would look like - { word : hello, code : 25 } here , you can get it worked. If you want to retrieve the text also , give it as follows - { text : hello from Elasticsearch , words : [ { word : hello , count : 25 } , { word : from , count : 22} ] } WHERE words field is nested type. Thanks Vineeth On Wed, Apr 30, 2014 at 4:36 PM, Neeraj Makam neeraj23...@gmail.com wrote: Hi Given a text,say hello elastic search world, is there a way i can associate a field or some metadata per word in the text on which i can later query? for eg: give code number to each word, and should be able to search like text = hello AND code = 25 i.e return all hello words which have 25 in their code metadata. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nwzNDRVkkxJ8qczgzFAacatg-tEQYaAkrk%2BeHRRcqthA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Specify metadata per word/term in a string
There is a related feature that is called payloads for terms. In Elasticsearch you can assign payload to terms, e.g. numbers for custom scoring. See also https://github.com/elasticsearch/elasticsearch/issues/3772 https://github.com/elasticsearch/elasticsearch/pull/4161 It uses DelimitedPayloadTokenFilter http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/payloads/DelimitedPayloadTokenFilter.html Jörg On Wed, Apr 30, 2014 at 1:06 PM, Neeraj Makam neeraj23...@gmail.com wrote: Hi Given a text,say hello elastic search world, is there a way i can associate a field or some metadata per word in the text on which i can later query? for eg: give code number to each word, and should be able to search like text = hello AND code = 25 i.e return all hello words which have 25 in their code metadata. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEV56vGmxgKceY5dcSkEsJwEwrmtVXona-MH298YOExnQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Truncating scores
Scores are Java floats so I'd expect them to be less precise then the long that getTime returns. I believe you could look at sorting rather then scoring or look at reducing the precision on the top bits of your long. You know, y2k bug style. The reason the score is a float is that for text scoring its exact enough. Also, some of the lucene data structures are actually more lossy then float. The field norm, iirc, is a floating point number packet into 8 bits rather the float's 32. Nik On Wed, Apr 30, 2014 at 5:56 AM, Loïc Wenkin loic.wen...@gmail.com wrote: Hello everybody, I am using the function_score query in order to compute a custom score for items I am indexing into ElasticSearch. I am using a native script (written in Java) in order to compute my score. This score is computed based on a date (Date.getTime()). When I use a logger and look what is returned by my native script, I get what I want, but when I look at the score of items returned by query (I use the replace mode), I get a truncated number (e.g. if a computed score displayed in the native script with the value 1 392 028 423 243, it is returned with the value 1 392 028 420 000 as score of returned items). The problem here is that I am loosing milliseconds and seconds (I only get the decade part of seconds). Loose milliseconds can be acceptable, but I can't loose seconds. Is this problem a limitation of ElasticSearch ? Is there any way to workaround this problem ? Thanks in advance for your replies. Regards, Loïc Wenkin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd39xkFEJNfb0x8C-M5h6GaxP7qqFYBFjTcBua1siVRttQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: SearchParseExceptions in Marvel monitoring cluster
Hi Mihir, This type of error typically ocour when the marvel index doesn't contain the right data. I'm intrigued by the ClusterBlockException on you monitoring cluster. Can you gist the output of : curl SERVER:9200/_cat/shards/?v for both nodes of you marvel cluster? Thx, Boaz On Monday, April 28, 2014 2:43:30 PM UTC+2, Mihir M wrote: Hi, We have 2 Elasticsearch clusters in our development environment. One of them is our development cluster with 9 nodes having - 4 Data nodes (with 4 GB heap) - 3 Master eligible nodes (default heap) - 2 Search Load Balancers (default heap) The second is our monitoring cluster for storing Marvel data of the development cluster. This cluster has 2 nodes running with default configuration. All the above nodes are running the latest ES version 1.1.1 and the latest Marvel version which is 1.1.0. Of late we have been seeing issues in the Marvel cluster. One of the nodes in the Marvel cluster throws the following exception continuously: [.marvel-2014.04.25][0], node[dA2UtjgdQ1S55zgvQHOHYQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@24de815] org.elasticsearch.search.SearchParseException: [.marvel-2014.04.25][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:total.search.query_total,interval:1m},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:_type:indices_stats}},filter:{bool:{must:[{range:{@timestamp:{from:1398434986844,to:now}}}],size:50,query:{filtered:{query:{query_string:{query:_type:cluster_event OR _type:node_event}},filter:{bool:{must:[{range:{@timestamp:{from:1398434986844,to:now}}}],sort:[{@timestamp:{order:desc}},{@timestamp:{order:desc}}]}]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634) at org.elasticsearch.search.SearchService.createContext(SearchService.java:507) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480) at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:324) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304) at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:296) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (value) field [total.search.query_total] not found at org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:186) at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93) at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622) ... 10 more It keeps repeating at regular intervals. Also this is observed in only one of the 2 nodes of the monitoring cluster. Usually it is the master which shows this exception. Similar exceptions are observed in the Marvel dashboard - Cluster Overview page. Also in the development cluster in one of the Master nodes, we see ClusterBlockException [shard state 0 not initialized or recovered] for the monitoring cluster. Please explain why this is happening. One more thing to add, we are facing this problem ever since we migrated to ES 1.1.0. Before that while running 1.0.0, no such things were observed. Looking forward to your reply. - Regards -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/SearchParseExceptions-in-Marvel-monitoring-cluster-tp4054926.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e21279a2-62e9-4d08-9aed-f9d32c110da5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Truncating scores
Hello Nikolas, Thanks for your reply. I have done something like what you have just explained. I divide the score by 5000 before returning it. Doing this, I remove milliseconds and I keep a precision of 5 seconds, which I expect to be enough. If it's always a problem, I may try to remove some years from the date in order to get a smallest number. I think that using sort is an hard work since I have something like this in my documents : a: { b: { objectsSortableByDate: [ ... ] }, c: { objectsSortableByDate: [ ... ] } } I want to filter my entities according the smallest (or highest) date of any objectsSortableByDate (whatever they are in b or in c), and sometime, I may have more than two nested objects, so, I think that the easiest way to sort is using a computed score. If you have a better idea, I will take it :) Loïc Le mercredi 30 avril 2014 14:48:37 UTC+2, Nikolas Everett a écrit : Scores are Java floats so I'd expect them to be less precise then the long that getTime returns. I believe you could look at sorting rather then scoring or look at reducing the precision on the top bits of your long. You know, y2k bug style. The reason the score is a float is that for text scoring its exact enough. Also, some of the lucene data structures are actually more lossy then float. The field norm, iirc, is a floating point number packet into 8 bits rather the float's 32. Nik On Wed, Apr 30, 2014 at 5:56 AM, Loïc Wenkin loic@gmail.comjavascript: wrote: Hello everybody, I am using the function_score query in order to compute a custom score for items I am indexing into ElasticSearch. I am using a native script (written in Java) in order to compute my score. This score is computed based on a date (Date.getTime()). When I use a logger and look what is returned by my native script, I get what I want, but when I look at the score of items returned by query (I use the replace mode), I get a truncated number (e.g. if a computed score displayed in the native script with the value 1 392 028 423 243, it is returned with the value 1 392 028 420 000 as score of returned items). The problem here is that I am loosing milliseconds and seconds (I only get the decade part of seconds). Loose milliseconds can be acceptable, but I can't loose seconds. Is this problem a limitation of ElasticSearch ? Is there any way to workaround this problem ? Thanks in advance for your replies. Regards, Loïc Wenkin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c003b925-0766-4750-a722-3125a77c3774%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation bug? Or user error?
This looks wrong indeed. By any chance, would you have a curl recreation of this issue? On Tue, Apr 29, 2014 at 7:35 PM, mooky nick.minute...@gmail.com wrote: It looks like a bug to me - but if its user error, then obviously I can fix it a lot quicker :) On Tuesday, 29 April 2014 13:04:53 UTC+1, mooky wrote: I am seeing some very odd aggregation results - where the sum of the sub-aggregations is more than the parent bucket. Results: CSSX : { doc_count : *24*, intentDate : { buckets : [ { key : Overdue, to : 1.3981248E12, to_as_string : 2014-04-22, doc_count : *1*, ME : { doc_count : *0* }, NOT_ME : { doc_count : *24* } }, { key : May, from : 1.3981248E12, from_as_string : 2014-04-22, to : 1.4006304E12, to_as_string : 2014-05-21, doc_count : *23*, ME : { doc_count : 0 }, NOT_ME : { doc_count : *24* } }, { key : June, from : 1.4006304E12, from_as_string : 2014-05-21, to : 1.4033088E12, to_as_string : 2014-06-21, doc_count : *0*, ME : { doc_count : *0* }, NOT_ME : { doc_count : *24* } } ] } }, I wouldn't have thought that to be possible at all. Here is the request that generated the dodgy results. CSSX : { filter : { and : { filters : [ { type : { value : inventory } }, { term : { isAllocated : false } }, { term : { intentMarketCode : CSSX } }, { terms : { groupCompanyId : [ 0D13EF2D0E114D43BFE362F5024D8873, 0D593DE0CFBE49BEA3BF5AD7CD965782, 1E9C36CC45C64FCAACDEE0AF4FB91FBA, 33A946DC2B0E494EB371993D345F52E4, 6471AA50DFCF4192B8DD1C2E72A032C7, 9FB2FFDC0FF0797FE04014AC6F0616B6, 9FB2FFDC0FF1797FE04014AC6F0616B6, 9FB2FFDC0FF2797FE04014AC6F0616B6, 9FB2FFDC0FF3797FE04014AC6F0616B6, 9FB2FFDC0FF5797FE04014AC6F0616B6, 9FB2FFDC0FF6797FE04014AC6F0616B6, AFE0FED33F06AFB6E04015AC5E060AA3 ] } }, { not : { filter : { terms : { status : [ Cancelled, Completed ] } } } } ] } }, aggregations : { intentDate : { date_range : { field : intentDate, ranges : [ { key : Overdue, to : 2014-04-22 }, { key : May, from : 2014-04-22, to : 2014-05-21 }, { key : June, from : 2014-05-21, to : 2014-06-21 } ] }, aggregations : { ME : { filter : { term : { trafficOperatorSid : S-1-5-21-20xxspan style=color: #000; class=styled-by ... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7WWj4GaAEH0K%2B37srpP4f_9S%3DKffM7k1DAAyZiy1zUpQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Date range query ignore month
Hi guys, I've been using Elasticsearch as my data store and I got lots of documents in it. My problem is, I figured it out that Elasticsearch does ignore month field regarding mapping and I can not get real search response. Here is what I have in my index and my query, please tell me if I'm wrong: curl -XPUT 'http://localhost:9200/tt6/' -d '{}' curl -XPUT 'http://localhost:9200/tt6/tweet/_mapping' -d '{tweet : {properties : {date : {type : date, format: -MM-DD HH:mm:ss ' curl -XPUT 'http://localhost:9200/tt6/tweet/1' -d '{date: 2014-02-14 04:00:45}' curl -XGET 'http://localhost:9200/tt6/_search' -d ' { query: { bool: { must: [ { range: { tweet.date: { from: 2014-12-01 00:00:00, to: 2014-12-30 00:00:00 } } } ], must_not: [], should: [] } }, from: 0, size: 10, sort: [], facets: {} }' And my response is { took: 3, timed_out: false, _shards: { total: 5, successful: 5, failed: 0 }, hits: { total: 1, max_score: 1, hits: [ { _index: tt6, _type: tweet, _id: 1, _score: 1, _source: { date: 2014-02-14 04:00:45, name: test } } ] } } By given date range it must has no response beet 1st of December 2014 and 30th of December 2014, but it returns. Any help will be appreciated. Regards. Fatih. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6ee655cf-9e77-439f-9aac-8255efafcb2a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Registering node event listeners
Would the DiscoverService solve my initial problem or only get around constructing a DiscoveryNodesProvider? DiscoverService only uses the InitialStateDiscoveryListener, which doesn't publish interesting events. I won't be near a computer in the next few days to test. -- Ivan On Wed, Apr 30, 2014 at 4:40 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: Have you looked at InternalNode.java? Form my understanding you could try to implement your own DiscoveryModule with DiscoveryService and start it like this DiscoveryService discoService = injector.getInstance(DiscoveryService.class).start(); Jörg On Wed, Apr 30, 2014 at 12:17 AM, Ivan Brusic i...@brusic.com wrote: I am looking to transition a piece of my search infrastructure from polling the cluster's health status to hopefully receiving notifications whenever an event occurs. Using the TransportService, I registered various relevant listeners, but none of them are triggered. Here is the gist of the code: https://gist.github.com/brusic/2dcced28e0ed753b6632 Most of it I stole^H^H^H^H^Hborrowed from ZenDiscovery. I am assuming something is not quite right with the TransportService. I tried using both a node client and a master-less/data-less client. I also suspect that the DiscoveryNodesProvider might not have been initialized correctly, but I am primarily after the events from NodesFaultDetection, which does not use the DiscoveryNodesProvider. I know I am missing something obvious, but I cannot quite spot it. Is there perhaps a different route using the TransportClient? Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC5twFLr%2By_oqkV3_SjS9T_kikG9Z%2BBi6DJ_jOydHYBCA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC5twFLr%2By_oqkV3_SjS9T_kikG9Z%2BBi6DJ_jOydHYBCA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEVGCvFFaeJmxba-UZEuKS7EK5FakqBbSgy4qUGuywtYg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoEVGCvFFaeJmxba-UZEuKS7EK5FakqBbSgy4qUGuywtYg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB%2B%3Dox_Q7D-U%3DVVROusfdGuJWHF_hxZJAT85NAZ0d%3D1eg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Substring match in search term order using Elasticsearch
Posted same question on stackover flow http://stackoverflow.com/questions/23244796/substring-match-in-search-term-order-using-elasticsearch; but still looking for Answer. I'm new to elasticsearch I want to perform substring/partial word match using elastic search. I want results to be returned in the perticular order. In order to explain my problem I will show you how I create my index, mappings and what are the records I use. *Creating Index and mappings:* PUT /my_index1 { settings: { analysis: { filter: { trigrams_filter: { type: ngram, min_gram: 3, max_gram: 3 } }, analyzer: { trigrams: { type: custom, tokenizer: standard, filter: [ lowercase, trigrams_filter ] } } } }, mappings: { my_type1: { properties: { text: { type: string, analyzer: trigrams } } } } } *Bulk record insert:* POST /my_index1/my_type1/_bulk { index: { _id: 1 }} { text: men's shaver } { index: { _id: 2 }} { text: men's foil shaver } { index: { _id: 3 }} { text: men's foil advanced shaver } { index: { _id: 4 }} { text: norelco men's foil advanced shaver } { index: { _id: 5 }} { text: men's shavers } { index: { _id: 6 }} { text: women's shaver } { index: { _id: 7 }} { text: women's foil shaver } { index: { _id: 8 }} { text: women's foil advanced shaver } { index: { _id: 9 }} { text: norelco women's foil advanced shaver } { index: { _id: 10 }} { text: women's shavers } *Now, I want to perform search for en's shaver. I'm searching using follwing query:* POST /my_index1/my_type1/_search { query: { match: { text: { query: en's shaver, minimum_should_match: 100% } } } } I want results to be in following sequence: 1. men's shaver -- closest match with following same search keyword order en's shaver 2. women's shaver -- closest match with following same search keyword order en's shaver 3. men's foil shaver -- increased distance by 1 4. women's foil shaver -- increased distance by 1 5. men's foil advanced shaver -- increased distance by 2 6. women's foil advanced shaver -- increased distance by 2 7. men's shavers -- substring match for shavers 8. women's shavers -- substring match for shavers I'm performing following query. It is not giving me result in the order I want: POST /my_index1/my_type1/_search { query: { query_string: { default_field: text, query: men's shaver, minimum_should_match: 90% } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7d43a2d-be99-45a5-a2a3-4151dbc52292%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Using snapshotrestore to separate indexing from searching
as I posted before, our system does not fit very well in cluster structure, because we have many small indices in place (about 1k indices with an average of 6k records each), we guessed that with so many small indices, the cluster spent too much time and resources which nodes should be master , or where to locate absurdly small shards, etc... Bottom line is that the cluster always ended up not working right. BTW, I'm suspecting that with a few advanced tuning options of the cluster (shard routing and the like) we may be able to put it on again, but unfortunately we can't find that kind of knowledge in the standard doc. If any of you have any hint on this, it would be greatly appreciated!!! Anyway, we need to scale the system somehow, and this is what we've come up with: - Our indices can have configuration variations that make a reindex needed at any time. it doesn't happen a lot, but it happens, and with 1k indices, it's bound to happen. - Indexing data is regenerated everyday, so every day the whole set of indices is re-created (we figured it's much faster to recreate the index than to update an existing one replacing everyone of its records) We would like the machines used for searching results are only used for that, and never used for indexing/reindexing ops, because we don't want the user experience to suffer when searching against an already loaded server because it's doing some heavy indexing. In our ideal scenario, indexing/reindexing would be done in devoted machines, which can be as many as needed, and searching would be done in different machines. We plan to use the snapshot/restore feature for that. Any time an index/reindex is needed, it would be done on one of these indexing machines, and then the fresh index would be snapshotted, to be restored to the search machine afterwards. We should have some client control to make sure the snapshot process is only once at a time, it's my understanding that this is not the case in the restore process (i.e. you can have more than one restore process running on a cluster). Individual item index can happen occasionally, but I figure when that happens we can just index to both the searching machines and the indexing machines, because it's never going to be big. Please understand cluster instead of machine How crazy does this whole thing sound, Is there any other way we can get some scalability? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82d7dd51-1b86-4b0f-8abc-425a45f1dfac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Lucene Date Range Query in Kibana
Is there a way in Kibana or Lucene to define a date range query as Today-60 days. Something along the logical lines of visit_date: [*-60 TO *] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/121e135e-e12f-417c-879f-36e877ec0d98%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Security of ES
Hi, Elasticsearch doesn't support any form of authentification or authorization at the moment. The way users deal with this issue is usually by giving access to Elasticsearch through a proxy that would handle security based on the path of the URL. On Wed, Apr 30, 2014 at 5:56 PM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, As a BOfH, I'm quite used to provide auth-based access to IT resources. As CISO I must guaranty that users get only what they need, especially about sensitive content. Unfortunately I can't find anything about authentication, and security in ES documentation. It looks like the product is designed like memcached: it's there and free to use. Is there any way to provide some partitioning inside an ES cluster, so that we can share the cluster without sharing the data? thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E22ED5A1-1554-4558-BBC7-3408CBA3C179%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j53TD4iwPrP76RcKP6ofojtho%2Bt2o9BCbNsx3u0BLGpRA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Security of ES
Yes. By now, you have to deal with security yourself. So, secure URL using Ngnix for example, use aliases which will expose alias URL and not direct index URL. Use filters in aliases. Example: Let's say you have a groupid field in your documents and you have a doc index. A doc A belongs to groupid marketing. Doc B belongs to groupid finances. Create an alias marketing which uses doc index with a prebuilt filter on groupid with marketing. Same for finances. Then secure your URLs using Nginx and let users only access to the right URLs (aliases) they should see. My 2 cents. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 30 avril 2014 à 17:56:10, Patrick Proniewski (elasticsea...@patpro.net) a écrit: Hello, As a BOfH, I'm quite used to provide auth-based access to IT resources. As CISO I must guaranty that users get only what they need, especially about sensitive content. Unfortunately I can't find anything about authentication, and security in ES documentation. It looks like the product is designed like memcached: it's there and free to use. Is there any way to provide some partitioning inside an ES cluster, so that we can share the cluster without sharing the data? thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E22ED5A1-1554-4558-BBC7-3408CBA3C179%40patpro.net. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53611f0e.257130a3.2280%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.
Re: ES and SAN storage
I think anyone will find it difficult to answer such questions just because there are several factors that derive the decision like latency requirements, high availability requirements, how shared SAN storage is and impact of somebody stealing IO under the hood etc. The best way is to develop a test model and test it out. Look at cluster settings on how to disable/enable shard allocation. On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, I'm still testing ES at a very small scale (1 node on a multipurpose server), but I would like to extend it's use at work as a backend for logstash. It means that the LS+ES cluster would have to eat few GB of data every day, up to 15 or 20GB later if things go well. I'm doing all this as a side project: no investment apart from work hours. I will recycle blades and storage we plan to decommission from our virtualization farm. So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal storage (an SD-card), and a LUN on a SAN. How does ES behave is shared storage condition? What are the best practices about nodes/shards/replicas/...? Intended audience is Operation team, so less than 10 persons. So no big search concurrency but probably mostly deep search and ill-designed queries :) thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0EF076AD-2908-4860-A97F-060A5C511AC3%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWrdPOcspORJT_AR%3DXUNQ5H0xfVcEpL%2B6aZ-sPb9X_Lsgw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Multiple or per field highlight type
I have mapping where I set one field's mapping term_vector to be with_positions_offsets. I would then like to search with highlights on all the fields, is that possible? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f6c4e9b9-6c52-4677-a735-8da93e16b507%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES and SAN storage
Well, then maybe my questions were not precise enough. My first goal was to make sure ES does work sharing a unique storage for all nodes. My second gaol was to learn if each node requires to have its dedicated file tree, or if you can put every files together as if there's only one ES node. Does-it make sense to have replicas when eventually filesystem IOs are shared? Does moving a shard from a node to another makes data passing through the CPU, or is ES smart enough to just pass the pointer to the file? On 30 avr. 2014, at 18:33, Mohit Anchlia wrote: I think anyone will find it difficult to answer such questions just because there are several factors that derive the decision like latency requirements, high availability requirements, how shared SAN storage is and impact of somebody stealing IO under the hood etc. The best way is to develop a test model and test it out. Look at cluster settings on how to disable/enable shard allocation. On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, I'm still testing ES at a very small scale (1 node on a multipurpose server), but I would like to extend it's use at work as a backend for logstash. It means that the LS+ES cluster would have to eat few GB of data every day, up to 15 or 20GB later if things go well. I'm doing all this as a side project: no investment apart from work hours. I will recycle blades and storage we plan to decommission from our virtualization farm. So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal storage (an SD-card), and a LUN on a SAN. How does ES behave is shared storage condition? What are the best practices about nodes/shards/replicas/...? Intended audience is Operation team, so less than 10 persons. So no big search concurrency but probably mostly deep search and ill-designed queries :) thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F6DDE665-B311-4964-A0BF-FFEF156E4FA3%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: Security of ES
Thanks Adrien. On 30 avr. 2014, at 18:02, Adrien Grand wrote: Hi, Elasticsearch doesn't support any form of authentification or authorization at the moment. The way users deal with this issue is usually by giving access to Elasticsearch through a proxy that would handle security based on the path of the URL. On Wed, Apr 30, 2014 at 5:56 PM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, As a BOfH, I'm quite used to provide auth-based access to IT resources. As CISO I must guaranty that users get only what they need, especially about sensitive content. Unfortunately I can't find anything about authentication, and security in ES documentation. It looks like the product is designed like memcached: it's there and free to use. Is there any way to provide some partitioning inside an ES cluster, so that we can share the cluster without sharing the data? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/859E581C-1821-4154-9DF8-461C1BFA225B%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: Security of ES
Hmmm ok I'll have to think about this. I do get the proxy part, very easy, I'm doing this kind of stuff for eons. Now you write I can discriminate URL's by injecting an arbitrary field into my data and creating an alias that names a prebuilt filter. I've discovered aliases just 2 hours ago, I'll have to dive into this to understand exactly how it works, and in particular how it can be used into a logstash install. thanks for the tip. On 30 avr. 2014, at 18:04, David Pilato wrote: Yes. By now, you have to deal with security yourself. So, secure URL using Ngnix for example, use aliases which will expose alias URL and not direct index URL. Use filters in aliases. Example: Let's say you have a groupid field in your documents and you have a doc index. A doc A belongs to groupid marketing. Doc B belongs to groupid finances. Create an alias marketing which uses doc index with a prebuilt filter on groupid with marketing. Same for finances. Then secure your URLs using Nginx and let users only access to the right URLs (aliases) they should see. My 2 cents. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 30 avril 2014 à 17:56:10, Patrick Proniewski (elasticsea...@patpro.net) a écrit: Hello, As a BOfH, I'm quite used to provide auth-based access to IT resources. As CISO I must guaranty that users get only what they need, especially about sensitive content. Unfortunately I can't find anything about authentication, and security in ES documentation. It looks like the product is designed like memcached: it's there and free to use. Is there any way to provide some partitioning inside an ES cluster, so that we can share the cluster without sharing the data? thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E22ED5A1-1554-4558-BBC7-3408CBA3C179%40patpro.net. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53611f0e.257130a3.2280%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/13A65587-3274-4A57-8DB7-4A7E2488A3D5%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: ES and SAN storage
I'll try and answer as much I know: ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node need its own unique file path, they don't share files from other nodes. Replicas in this only make sense if you are solving for a VM or a node failure per se. Or it also makes sense if you have SAN storage coming from a different array. I don't follow your last question. On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Well, then maybe my questions were not precise enough. My first goal was to make sure ES does work sharing a unique storage for all nodes. My second gaol was to learn if each node requires to have its dedicated file tree, or if you can put every files together as if there's only one ES node. Does-it make sense to have replicas when eventually filesystem IOs are shared? Does moving a shard from a node to another makes data passing through the CPU, or is ES smart enough to just pass the pointer to the file? On 30 avr. 2014, at 18:33, Mohit Anchlia wrote: I think anyone will find it difficult to answer such questions just because there are several factors that derive the decision like latency requirements, high availability requirements, how shared SAN storage is and impact of somebody stealing IO under the hood etc. The best way is to develop a test model and test it out. Look at cluster settings on how to disable/enable shard allocation. On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, I'm still testing ES at a very small scale (1 node on a multipurpose server), but I would like to extend it's use at work as a backend for logstash. It means that the LS+ES cluster would have to eat few GB of data every day, up to 15 or 20GB later if things go well. I'm doing all this as a side project: no investment apart from work hours. I will recycle blades and storage we plan to decommission from our virtualization farm. So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal storage (an SD-card), and a LUN on a SAN. How does ES behave is shared storage condition? What are the best practices about nodes/shards/replicas/...? Intended audience is Operation team, so less than 10 persons. So no big search concurrency but probably mostly deep search and ill-designed queries :) thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F6DDE665-B311-4964-A0BF-FFEF156E4FA3%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWrqqNrh7jbW3%2BvO%2BSpXdxRGTvB3zcCod6yPRMgt42kcUA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Significant Term aggregation
Hi: I have been trying to use (and successfully did) the Significant terms aggregations in release 1.1.0. The blog posts about this feature http://www.elasticsearch.org/blog/significant-terms-aggregation/ was extremely helpful. Since this feature is in experimental stage and the authors had requested feedback and me not knowing about how to provide feedback regarding specific features, I am restarting to posting on this group. I had posted on a different thread regarding accessing the TFIDF scores for terms so that I could investigate ways in which I could enhance my queries. This lead me to look at the experimental Significant Terms Aggregation. It does what it says quite well. and I am glad this functionality exists. However, I would like to see some possibilities of enhancements: What I noticed in my aggregation results is a lot of Stopwords (a, an, the, at, and, etc.) being included as significant terms. perhaps having the possibility of including Stopword lists so that these stop words are not included in the signifiant term calculations. (The significance is calculated based on how many times a term appears in the query result vs how many times it appears in whole index. ) For common stop words this calculation i going to make them very significant. Another possible enhancement would be get a phrase significance (instead of a single term, doing a multi term significance) would be nice. In the blog post, a similar effect is obtained by highlighting the terms that are identified as significant.But it would be nice to just look at the buckets and determine that. Cheers and Thanks for all the fish Ramdev -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95bec4ed-69c6-409d-b6b8-4bbe4c8da229%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Lucene Date Range Query in Kibana
Lucene and hence elastic search and hence Kibana allows for date range to be queries as [NOW-60DAY TO NOW] similar to what you said. On Wednesday, 30 April 2014 10:37:33 UTC-5, Uli Bethke wrote: Is there a way in Kibana or Lucene to define a date range query as Today-60 days. Something along the logical lines of visit_date: [*-60 TO *] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1cb787e8-f7c2-4520-9cf1-4098c15d95de%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Substring match in search term order using Elasticsearch
what happens when you query as you indicated ? did you try and wildchar query ? Also perhaps an analyzer with the shingle token filter (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html#analysis-shingle-tokenfilter) will work better for your purposes ? Ramdev On Wednesday, 30 April 2014 09:15:35 UTC-5, Kruti Shukla wrote: Posted same question on stackover flow http://stackoverflow.com/questions/23244796/substring-match-in-search-term-order-using-elasticsearch; but still looking for Answer. I'm new to elasticsearch I want to perform substring/partial word match using elastic search. I want results to be returned in the perticular order. In order to explain my problem I will show you how I create my index, mappings and what are the records I use. *Creating Index and mappings:* PUT /my_index1 { settings: { analysis: { filter: { trigrams_filter: { type: ngram, min_gram: 3, max_gram: 3 } }, analyzer: { trigrams: { type: custom, tokenizer: standard, filter: [ lowercase, trigrams_filter ] } } } }, mappings: { my_type1: { properties: { text: { type: string, analyzer: trigrams } } } } } *Bulk record insert:* POST /my_index1/my_type1/_bulk { index: { _id: 1 }} { text: men's shaver } { index: { _id: 2 }} { text: men's foil shaver } { index: { _id: 3 }} { text: men's foil advanced shaver } { index: { _id: 4 }} { text: norelco men's foil advanced shaver } { index: { _id: 5 }} { text: men's shavers } { index: { _id: 6 }} { text: women's shaver } { index: { _id: 7 }} { text: women's foil shaver } { index: { _id: 8 }} { text: women's foil advanced shaver } { index: { _id: 9 }} { text: norelco women's foil advanced shaver } { index: { _id: 10 }} { text: women's shavers } *Now, I want to perform search for en's shaver. I'm searching using follwing query:* POST /my_index1/my_type1/_search { query: { match: { text: { query: en's shaver, minimum_should_match: 100% } } } } I want results to be in following sequence: 1. men's shaver -- closest match with following same search keyword order en's shaver 2. women's shaver -- closest match with following same search keyword order en's shaver 3. men's foil shaver -- increased distance by 1 4. women's foil shaver -- increased distance by 1 5. men's foil advanced shaver -- increased distance by 2 6. women's foil advanced shaver -- increased distance by 2 7. men's shavers -- substring match for shavers 8. women's shavers -- substring match for shavers I'm performing following query. It is not giving me result in the order I want: POST /my_index1/my_type1/_search { query: { query_string: { default_field: text, query: men's shaver, minimum_should_match: 90% } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df570460-9e71-4c4b-9208-c5a7f467cde5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Cannot asynchronously update replica settings over many tables (ES 0.90.7)
Hi all, I am trying to grow my replicas from 0 to 2 across about 300 tables. I'm doing this by asynchronously issuing an UpdateSettingsRequest (through the Java client) for each table. The first 100 go through fine (responding with a UpdateSettingsResponse), but the final ~200 fail with this exception: Failure is org.elasticsearch.transport.RemoteTransportException: [my-cluster][inet[/w.x.y.z:9300]][indices/settings/update] We're using ES version 0.90.7. Any ideas what might be clogging the pipes? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a61bfd14-e5d3-44ac-b3eb-2f1e95268101%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
help with jdbc rivers and type mapping
i can't seem to understand how to fully set up my type mappings while using jdbc rivers and sql server. here's an example. PUT /_river/mytest_river/_meta { type: jdbc, jdbc: { url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase, user:myuser, password:xxx, sql:select * from dbo.musicalbum (nolock), strategy : oneshot, index : myindex, type : album, bulk_size : 100, max_retries: 5, max_retries_wait:30s, max_bulk_requests : 5, bulk_flush_interval : 5s, type_mapping: { album: {properties: { AlbumDescription: {type: string}, AlbumID: {type: string}, Artist: {type: string}, Genre: {type: string,index : not_analyzed}, Label: {type: string}, Title: {type: string}, _id : {path : AlbumID} } } } } } so you can see i've specified both a select statement (which normally would dynamically produce the mapping for me) and also a type mapping. in the type mapping i've tried to specify that i want the _id to be the same as AlbumID, and also that i want the Genre to be not_analyzed. it ends up throwing multiple errors, only indexing one document, and not creating my full mapping. here's what the mapping ends up looking like: (skipping some of the columns altogether!) { myindex: { mappings: { album: { properties: { AlbumDescription: { type: string }, AlbumID: { type: string }, Artist: { type: string }, Genre: { type: string }, Title: { type: string } } } } } } any assistance would be helpful. it's driving me nuts. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c9af783-cf6c-4e41-a287-83ff5589350e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Sense on github abandoned?
Agree 100%. Sense must return to Chrome Store! El martes, 29 de abril de 2014 11:52:49 UTC-3, Joshua Worden escribió: Would love to see this return to the chrome store. Was rather surprised to see it gone when getting another developer started working with elasticsearch. Even if it was buggy, it was the best way to get started. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES and SAN storage
On 30 avr. 2014, at 19:34, Mohit Anchlia wrote: I'll try and answer as much I know: ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node need its own unique file path, they don't share files from other nodes. ok. Replicas in this only make sense if you are solving for a VM or a node failure per se. Or it also makes sense if you have SAN storage coming from a different array. ok. I don't follow your last question. My english is limited, sorry. As far as I understand ES, some shard balancing occurs in the background, when some are created or deleted, others will move from node to node so the number of shards is even between nodes. When storage is isolated for each node, moving a shard to another node requires the file to go through the node CPU/RAM, then network, then CPU/RAM of remote node, then storage. It would be very nice in a shared-storage scenario that the shard would not be moved through fs-cpu-ram-network-cpu-ram-fs but through a simple rename-and-tell action. Does it make sense? On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Well, then maybe my questions were not precise enough. My first goal was to make sure ES does work sharing a unique storage for all nodes. My second gaol was to learn if each node requires to have its dedicated file tree, or if you can put every files together as if there's only one ES node. Does-it make sense to have replicas when eventually filesystem IOs are shared? Does moving a shard from a node to another makes data passing through the CPU, or is ES smart enough to just pass the pointer to the file? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2D4B8E1F-3513-465F-B864-65401D9E38E1%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: help with jdbc rivers and type mapping
Thanks for the report. Does it work if you create the index with the custom mapping beforehand, with tool like curl? The JDBC river will use existing index then. Jörg On Wed, Apr 30, 2014 at 9:56 PM, Eric Sims eric.sims.aent@gmail.comwrote: i can't seem to understand how to fully set up my type mappings while using jdbc rivers and sql server. here's an example. PUT /_river/mytest_river/_meta { type: jdbc, jdbc: { url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase, user:myuser, password:xxx, sql:select * from dbo.musicalbum (nolock), strategy : oneshot, index : myindex, type : album, bulk_size : 100, max_retries: 5, max_retries_wait:30s, max_bulk_requests : 5, bulk_flush_interval : 5s, type_mapping: { album: {properties: { AlbumDescription: {type: string}, AlbumID: {type: string}, Artist: {type: string}, Genre: {type: string,index : not_analyzed}, Label: {type: string}, Title: {type: string}, _id : {path : AlbumID} } } } } } so you can see i've specified both a select statement (which normally would dynamically produce the mapping for me) and also a type mapping. in the type mapping i've tried to specify that i want the _id to be the same as AlbumID, and also that i want the Genre to be not_analyzed. it ends up throwing multiple errors, only indexing one document, and not creating my full mapping. here's what the mapping ends up looking like: (skipping some of the columns altogether!) { myindex: { mappings: { album: { properties: { AlbumDescription: { type: string }, AlbumID: { type: string }, Artist: { type: string }, Genre: { type: string }, Title: { type: string } } } } } } any assistance would be helpful. it's driving me nuts. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c9af783-cf6c-4e41-a287-83ff5589350e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4c9af783-cf6c-4e41-a287-83ff5589350e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEGjQfv%2BkRgia-GRu8D805hmv%2BLUkLXtCBX8VxHSFTTEQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: The effect of multi-fields and copy_to on storage size
Ideas anyone? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/21311c5e-c0d5-4896-8560-a24e1683b1fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Sense on github abandoned?
Must is a strong word. I highlighted some alternatives earlier. On Apr 30, 2014 1:01 PM, @mromagnoli marce.romagn...@gmail.com wrote: Agree 100%. Sense must return to Chrome Store! El martes, 29 de abril de 2014 11:52:49 UTC-3, Joshua Worden escribió: Would love to see this return to the chrome store. Was rather surprised to see it gone when getting another developer started working with elasticsearch. Even if it was buggy, it was the best way to get started. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCZwdaaKvTDbyTFJ4oOMaxHiY63GSWmDN0shEJPbbB%2BgA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Sense on github abandoned?
Yeah, maybe you are right. Anyway i have installed Marvel, and make a bookmark in Chrome with the URL to Sense. Perhaps I cried in advance ;P El miércoles, 30 de abril de 2014 17:19:36 UTC-3, Ivan Brusic escribió: Must is a strong word. I highlighted some alternatives earlier. On Apr 30, 2014 1:01 PM, @mromagnoli marce.r...@gmail.com javascript: wrote: Agree 100%. Sense must return to Chrome Store! El martes, 29 de abril de 2014 11:52:49 UTC-3, Joshua Worden escribió: Would love to see this return to the chrome store. Was rather surprised to see it gone when getting another developer started working with elasticsearch. Even if it was buggy, it was the best way to get started. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd0ec98d-d507-4a01-a9e7-d59535637465%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: help with jdbc rivers and type mapping
no. i just tried deleting all indexes, then i did: PUT /myindex then PUT /myindex/album/_mapping { myindex: { mappings: { album: { properties: { AlbumDescription: {type: string}, AlbumID: {type: string}, Artist: {type: string}, Genre: {type: string,index : not_analyzed}, Label: {type: string}, Title: {type: string}, _id : {path : AlbumID} } } } } } then i ran the PUT statement in my previous post. it still treats it as dynamic mappings On Wednesday, April 30, 2014 3:56:22 PM UTC-4, Eric Sims wrote: i can't seem to understand how to fully set up my type mappings while using jdbc rivers and sql server. here's an example. PUT /_river/mytest_river/_meta { type: jdbc, jdbc: { url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase, user:myuser, password:xxx, sql:select * from dbo.musicalbum (nolock), strategy : oneshot, index : myindex, type : album, bulk_size : 100, max_retries: 5, max_retries_wait:30s, max_bulk_requests : 5, bulk_flush_interval : 5s, type_mapping: { album: {properties: { AlbumDescription: {type: string}, AlbumID: {type: string}, Artist: {type: string}, Genre: {type: string,index : not_analyzed}, Label: {type: string}, Title: {type: string}, _id : {path : AlbumID} } } } } } so you can see i've specified both a select statement (which normally would dynamically produce the mapping for me) and also a type mapping. in the type mapping i've tried to specify that i want the _id to be the same as AlbumID, and also that i want the Genre to be not_analyzed. it ends up throwing multiple errors, only indexing one document, and not creating my full mapping. here's what the mapping ends up looking like: (skipping some of the columns altogether!) { myindex: { mappings: { album: { properties: { AlbumDescription: { type: string }, AlbumID: { type: string }, Artist: { type: string }, Genre: { type: string }, Title: { type: string } } } } } } any assistance would be helpful. it's driving me nuts. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1bda2b24-8fc4-4706-a43f-cadf820ebc6c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES and SAN storage
It makes sense if it was just as simple :) The reason shards need to move through the higher level of stack is that every node maintains it's own indexes or lucene segments and it can't just be switched. And I think that is primarily because of how internal structures are maintained in lucene. You might be able to develop a workaround using one or more of these settings: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html On Wed, Apr 30, 2014 at 1:05 PM, Patrick Proniewski elasticsea...@patpro.net wrote: On 30 avr. 2014, at 19:34, Mohit Anchlia wrote: I'll try and answer as much I know: ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node need its own unique file path, they don't share files from other nodes. ok. Replicas in this only make sense if you are solving for a VM or a node failure per se. Or it also makes sense if you have SAN storage coming from a different array. ok. I don't follow your last question. My english is limited, sorry. As far as I understand ES, some shard balancing occurs in the background, when some are created or deleted, others will move from node to node so the number of shards is even between nodes. When storage is isolated for each node, moving a shard to another node requires the file to go through the node CPU/RAM, then network, then CPU/RAM of remote node, then storage. It would be very nice in a shared-storage scenario that the shard would not be moved through fs-cpu-ram-network-cpu-ram-fs but through a simple rename-and-tell action. Does it make sense? On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Well, then maybe my questions were not precise enough. My first goal was to make sure ES does work sharing a unique storage for all nodes. My second gaol was to learn if each node requires to have its dedicated file tree, or if you can put every files together as if there's only one ES node. Does-it make sense to have replicas when eventually filesystem IOs are shared? Does moving a shard from a node to another makes data passing through the CPU, or is ES smart enough to just pass the pointer to the file? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2D4B8E1F-3513-465F-B864-65401D9E38E1%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqDyjcfPKxvY37b%2B%2BTwnDk7xj9A%2BL0k19wiLG58XNGPZA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: help with jdbc rivers and type mapping
The mapping has errors. Something like this might work better: DELETE /myindex PUT /myindex PUT /myindex/album/_mapping { album: { properties: { AlbumDescription: {type: string}, AlbumID: {type: string}, Artist: {type: string}, Genre: {type: string,index : not_analyzed}, Label: {type: string}, Title: {type: string}, _id : { index_name : album.AlbumID, path : full, type : string } } } } GET /myindex/album/_mapping Jörg On Wed, Apr 30, 2014 at 10:34 PM, Eric Sims eric.sims.aent@gmail.comwrote: no. i just tried deleting all indexes, then i did: PUT /myindex then PUT /myindex/album/_mapping { myindex: { mappings: { album: { properties: { AlbumDescription: {type: string}, AlbumID: {type: string}, Artist: {type: string}, Genre: {type: string,index : not_analyzed}, Label: {type: string}, Title: {type: string}, _id : {path : AlbumID} } } } } } then i ran the PUT statement in my previous post. it still treats it as dynamic mappings On Wednesday, April 30, 2014 3:56:22 PM UTC-4, Eric Sims wrote: i can't seem to understand how to fully set up my type mappings while using jdbc rivers and sql server. here's an example. PUT /_river/mytest_river/_meta { type: jdbc, jdbc: { url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase, user:myuser, password:xxx, sql:select * from dbo.musicalbum (nolock), strategy : oneshot, index : myindex, type : album, bulk_size : 100, max_retries: 5, max_retries_wait:30s, max_bulk_requests : 5, bulk_flush_interval : 5s, type_mapping: { album: {properties: { AlbumDescription: {type: string}, AlbumID: {type: string}, Artist: {type: string}, Genre: {type: string,index : not_analyzed}, Label: {type: string}, Title: {type: string}, _id : {path : AlbumID} } } } } } so you can see i've specified both a select statement (which normally would dynamically produce the mapping for me) and also a type mapping. in the type mapping i've tried to specify that i want the _id to be the same as AlbumID, and also that i want the Genre to be not_analyzed. it ends up throwing multiple errors, only indexing one document, and not creating my full mapping. here's what the mapping ends up looking like: (skipping some of the columns altogether!) { myindex: { mappings: { album: { properties: { AlbumDescription: { type: string }, AlbumID: { type: string }, Artist: { type: string }, Genre: { type: string }, Title: { type: string } } } } } } any assistance would be helpful. it's driving me nuts. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1bda2b24-8fc4-4706-a43f-cadf820ebc6c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1bda2b24-8fc4-4706-a43f-cadf820ebc6c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGkTLqF6VC4kSYMT2WjnAcLiLF4RE-DG4914uc31DdRGg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Limit the amount of data generated by Marvel with marvel.agent.interval ?
I'm managing a pretty badass 11 node Elasticsearch cluster that is powering a customer facing dashboard reporting platform. 20 cores per node, 64GB RAM, SSDs, Dual 10 GbE of awesome. I evaluated Marvel while we were still in development on the new platform and I found it to be a very valuable tool. At first Marvel was indexing to the same cluster we were monitoring and this was okay while we were in development as there were plenty of extra cycles in the cluster to handle the load but now that we are in production it doesn't make sense to burden the cluster with this. The nature of our reporting system requires us to to have an index for each customer so we're currently at 328 indexes and over 10,000 shards total. The amount of data indexed by Marvel increases dramatically as the number of indices increases so once we got over 300 indices in the system the daily marvel index ended up at around 400 GB replicated and was indexing around 2,000 documents a second by itself. What I want to do is have Marvel index to a not as awesome 2 node Elasticsearch monitoring cluster. 12 cores, 64 GB RAM and spinning disks. But in practice these 2 nodes are unable to keep up with the load and get completely bogged down. I'm thinking I can sacrifice redundancy and buy myself some cycles by not using any replicas on the Marvel index. My other idea is to set marvel.agent.interval from the default 10s to something like 30s on the assumption that this will cut the amount of data generated by a third. Does this sound sane or do you have anyone have other ideas on what I can try to limited the load? marvel.agent.interval Controls the interval between data samples. Defaults to 10s. Set to -1 to temporarily disable exporting. This setting is update-able via the Cluster Update Settings API. Thanks -Logan- -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5884045a-49f7-48d4-a3cb-93a5f70c53cf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: help with jdbc rivers and type mapping
here's another weird bit. it doesn't seem to show the mappings right after i set them: PUT /myindex/album/_mapping { myindex: { mappings: { album: { properties: { albumdescription: {type: string}, albumid: {type: string}, artist: {type: string}, genre: {type: string, index : not_analyzed}, label: {type: string, analyzer: whitespace}, title: {type: string}, time: {type : string}, _id : { index_name : album.AlbumID, path : full, type : string } } } } } } GET /myindex/album/_mapping returns this: { myindex: { mappings: { album: { properties: {} } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55b7887e-43e3-4836-bef7-55e4c9c6c8e5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Limit the amount of data generated by Marvel with marvel.agent.interval ?
That's pretty sane. I believe the newest version of marvel increased the default from 5s to 10s. But be aware, you are breaking the license for Marvel with that number of nodes - http://www.elasticsearch.org/overview/marvel/ Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 1 May 2014 06:52, Logan Hardy loganbha...@gmail.com wrote: I'm managing a pretty badass 11 node Elasticsearch cluster that is powering a customer facing dashboard reporting platform. 20 cores per node, 64GB RAM, SSDs, Dual 10 GbE of awesome. I evaluated Marvel while we were still in development on the new platform and I found it to be a very valuable tool. At first Marvel was indexing to the same cluster we were monitoring and this was okay while we were in development as there were plenty of extra cycles in the cluster to handle the load but now that we are in production it doesn't make sense to burden the cluster with this. The nature of our reporting system requires us to to have an index for each customer so we're currently at 328 indexes and over 10,000 shards total. The amount of data indexed by Marvel increases dramatically as the number of indices increases so once we got over 300 indices in the system the daily marvel index ended up at around 400 GB replicated and was indexing around 2,000 documents a second by itself. What I want to do is have Marvel index to a not as awesome 2 node Elasticsearch monitoring cluster. 12 cores, 64 GB RAM and spinning disks. But in practice these 2 nodes are unable to keep up with the load and get completely bogged down. I'm thinking I can sacrifice redundancy and buy myself some cycles by not using any replicas on the Marvel index. My other idea is to set marvel.agent.interval from the default 10s to something like 30s on the assumption that this will cut the amount of data generated by a third. Does this sound sane or do you have anyone have other ideas on what I can try to limited the load? marvel.agent.interval Controls the interval between data samples. Defaults to 10s. Set to -1 to temporarily disable exporting. This setting is update-able via the Cluster Update Settings API. Thanks -Logan- -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5884045a-49f7-48d4-a3cb-93a5f70c53cf%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5884045a-49f7-48d4-a3cb-93a5f70c53cf%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b_tp-8afb-okJSkWQ76KKbzFf9gaa97RJheLCx8-Zg0Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Performance of Indexed-Shape Queries Vs Geoshape Queries
Hi Alex, Thanks for your response. Does this mean that the shape that I query by does not need to be indexed by Elasticsearch on the fly? Or does this mean that the indexing of the shape is so quick it does not affect the query latency? Thank you, Ilya. On 21 April 2014 22:46, Alexander Reelsen a...@spinscale.de wrote: Hey, the main difference is basically the network overhead. What happens behind the curtains is that a GET request for the shape is being executed if you specify it in the request and then this shape is used instead of the provided one. Makes sense? --Alex On Tue, Apr 15, 2014 at 6:50 AM, ipari...@thoughtworks.com wrote: Hi, We ran tests comparing performance of Indexed-Shape Queries to custom Geoshape Queries. We found that Elasticsearch yielded roughly same results in both cases. We expected Indexed Shape queries to be faster than custom Geoshape queries. Our understanding is that Elasticsearch has to convert the custom geoshapes to quadtree on the fly as opposed to having it pre-generated. I was wondering if anyone could let us know why there is no difference in performance between these two query types. *Experiment Design* We indexed suburb boundary geometries into one doctype, and geocoded points of interest (POIs) into another. We picked top 20 suburbs with geometries that have most vertices, and ran two following queries for each suburb geometry. Geoshape Query GET /spike_index/doc_type_pois/_search { query: { geo_shape: { field_geocode: { shape: { type: polygon, coordinates: [ suburb multipolygon ] } } } } } Indexed-Shape Query GET /spike_index/doc_type_pois/_search { query: { geo_shape: { field_geocode: { indexed_shape: { id: pre-indexed-geometry-id, type: doc_type_suburb_quadtree, index: spike_index, path: field_geometry } } } } } The test was carried out using Siege from a box located within the same VPC as the Elasticsearch instances. Please find the results below. *Indexed-Shape Query Results* Transactions:749559 hits Availability:100.00 % Elapsed time:602.80 secs Data transferred: 10342.97 MB Response time: 0.01 secs Transaction rate: 1243.46 trans/sec Throughput: 17.16 MB/sec Concurrency: 14.92 Successful transactions: 749559 Failed transactions:0 Longest transaction: 5.01 Shortest transaction: 0.00 *Geoshape Query Results* Transactions:723894 hits Availability:100.00 % Elapsed time:599.16 secs Data transferred: 9988.83 MB Response time: 0.01 secs Transaction rate: 1208.18 trans/sec Throughput: 16.67 MB/sec Concurrency: 14.92 Successful transactions: 723894 Failed transactions:0 Longest transaction: 1.02 Shortest transaction: 0.00 If anyone could shed some light on why the results of these queries are the same that would be very helpful. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bfebad47-fd6d-45fe-8bca-97eb14199dad%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/bfebad47-fd6d-45fe-8bca-97eb14199dad%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/qwLNX9SXnkY/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8LaaFdzazyaNrfWV8wRydduNX57kFU2w_6pw5-O2Gabg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGCwEM8LaaFdzazyaNrfWV8wRydduNX57kFU2w_6pw5-O2Gabg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: how to aggregate by metadata (types/field names)?
bump I'm new to elastic, considering to move from a proprietary system... I'm blocked on the fact that I can't get list of field hits per document as part of search results... Any help any clue? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/19e12a97-9db5-4d54-b8dd-91662c82a22a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.