Re: simple query string with flags returns no results
Are you actually using a comma after the firstname^1.3? It is invalid JSON in both cases 2015-04-28 14:15 GMT-03:00 Daniel Nill danielln...@gmail.com: curl -XPUT http://0.0.0.0:9200/users; -d '{ first_name: daniel, last_name: nill }' curl -XGET 'http://0.0.0.0:9200/users/_search; -d '{ query: { bool: { must: { simple_query_string: { query:daniel nill, fields:[ lastname^6.5, firstname^1.3, ], default_operator:and, flags:AND|OR|NOT|PHRASE|PRECEDENCE } } } } }' this returns no results However, curl -XGET 'http://0.0.0.0:9200/users/_search; -d '{ query: { bool: { must: { simple_query_string: { query:daniel nill, fields:[ lastname^6.5, firstname^1.3, ], default_operator:and } } } } }' This returns results. Any idea what I'm missing? This is on 1.5.1 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83d32c16-80b7-4428-904b-4d5bc9055be0%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/83d32c16-80b7-4428-904b-4d5bc9055be0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533kQ4uQR%2B9fGhwKqDi2P_cM-R6gNDcCheG098E2X_DoiQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: ScrollId doesn't advance with 2 indexes on a read alias 1.4.4
Are you sure that calling the same scroll_id won't return the next results? AFAIK, the scroll_id can be the same and still return new records 2015-04-14 14:26 GMT-03:00 Todd Nine tn...@apigee.com: Hey guys, I have 2 indexes. I have a read alias on both of the indexes (A and B), and a write alias on 1 (B). I then insert 10 documents to the write alias which inserts them into index B. I perform the following query. { from : 0, size : 1, post_filter : { bool : { must : { term : { edgeSearch : 4cd2ba95-e2c9-11e4-bb39-c6c6eebe8d56_application__4cd2ba96-e2c9-11e4-bb39-c6c6eebe8d56_owner__users__SOURCE } } } }, sort : [ { fields.double : { order : asc, nested_filter : { term : { name : ordinal } } } }, { fields.long : { order : asc, nested_filter : { term : { name : ordinal } } } }, { fields.string.exact : { order : asc, nested_filter : { term : { name : ordinal } } } }, { fields.boolean : { order : asc, nested_filter : { term : { name : ordinal } } } } ] } I receive my first record, and a scroll id, as expected. On my next request, I perform a request with the the scroll Id from the first response. What I expect: I expect to receive my second record, and a new scrollId. What I get: I get the first record again, with the same scroll Id. I'm on a 1.4.4 server, with a 1.4.4 node client running locally integration testing. When I use the same logic on a read alias with a single index, I do not experience this problem, so I'm reasonably certain my client is coded correctly. Any ideas? Thanks, Todd -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8a317e5f-eb6f-4aef-a257-3902d31c3567%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8a317e5f-eb6f-4aef-a257-3902d31c3567%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2530z-FubTUaxd7rRUDLHahRD3SD1Dz1r7nhF9PJkycdsuA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Alert notification with percolator
I have never used percolator, but afaik you have to call the percolator api after you have the document indexed: http://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html#_percolating_an_existing_document 2015-04-02 15:25 GMT-03:00 Lincoln Xiong xiong.huang...@gmail.com: I try to use elasticsearch as a 2nd log output storage, to analyze some info in logs. In this case, an alert triggers will be very useful. I read through docs talking about percolator and I think this should be the way to make it out.But after some trying, I found that I don't really get how percolator works. It seems that I use REST API to index a document with a percolator already set up, it will return if that document match the percolator query or not. For my case, I use Logstash as input which of course don't have this kind of feedback. And a count appears to be accessible from REST that I can use to get this kind feedback from percolator but I find it no where. Could some one give me an idea about how I can achieve this kind of feature with Elasticsearch? I know I can have ways to trigger an alert in Logstash but for my case Logstash is a temporary tool to input the data, I could possible not use it in the future. I also notice that Graylog has kind of alert. When the input event match some keywords, the alarm will trigger. I guess it also use some percolator APIs but I wish to know how can I do this alone with Elasticsearch only. Thanks a lot. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf5da9d8-7000-4a0b-94af-3ce064feee90%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cf5da9d8-7000-4a0b-94af-3ce064feee90%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2530%2BAkecJAbVnQ3E8X1_iWNqzAK%3DmbDkHgECAomZgnfdrg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How does Elasticsearch convert dates to JSON string representations?
Elastic won't edit your source. The long type is used internally 2015-03-20 14:16 GMT-03:00 Erik Iverson erikriver...@gmail.com: Hello everyone, I have a question on how Elasticsearch returns JSON representations of fields with the date type. My confusion comes from the fact that the page http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html says: The date type is a special type which maps to JSON string type. It follows a specific format that can be explicitly set. *All dates are UTC. Internally, a date maps to a number type long, with the added parsing stage from string to long and from long to string.* (emphasis mine) It sounds like dates are stored as type 'long'. But when I POST documents with dates and then retrieve them, they are returned in the same format as I POSTed them. So it appears ES is storing how I POSTed each date somewhere. I have a reproducible curl example demonstrating my confusion in more detail on Stackoverflow here: http://stackoverflow.com/questions/29157945/how-does-elasticsearch-convert-dates-to-json-string-representations Thank you for any insights! Best, --Erik Iverson -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2530Vwz99jV1K0JWNpGkOvR23PhHPAPacc-oROO0hZFjA0g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How does Elasticsearch convert dates to JSON string representations?
Well, the company won't edit his source anyway :p (but I get your point, I'm used to refer to Elasticsearch as Elastic, I have to fix it) I think his question is: he posts a document with a date in string format and retrieve it in the same format. He was expecting to retrieve it as long type since it is the type Elasticsearch uses internally. I'm not familiar with the internal code of Elasticsearch, but as far as I know, it won't change the source during indexing. It probably uses long type in the index, but when you retrieve the source, you retrieve the exactly source you posted 2015-03-20 16:16 GMT-03:00 Mark Walkom markwal...@gmail.com: It's Elasticsearch, Elastic is the company :) We convert dates to unix epoch, which is why you should insert them as UTC. On 20 March 2015 at 10:22, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Elastic won't edit your source. The long type is used internally 2015-03-20 14:16 GMT-03:00 Erik Iverson erikriver...@gmail.com: Hello everyone, I have a question on how Elasticsearch returns JSON representations of fields with the date type. My confusion comes from the fact that the page http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html says: The date type is a special type which maps to JSON string type. It follows a specific format that can be explicitly set. *All dates are UTC. Internally, a date maps to a number type long, with the added parsing stage from string to long and from long to string.* (emphasis mine) It sounds like dates are stored as type 'long'. But when I POST documents with dates and then retrieve them, they are returned in the same format as I POSTed them. So it appears ES is storing how I POSTed each date somewhere. I have a reproducible curl example demonstrating my confusion in more detail on Stackoverflow here: http://stackoverflow.com/questions/29157945/how-does-elasticsearch-convert-dates-to-json-string-representations Thank you for any insights! Best, --Erik Iverson -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/add6bafe-bdba-406e-b95d-ade17d0a9df5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2530Vwz99jV1K0JWNpGkOvR23PhHPAPacc-oROO0hZFjA0g%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2530Vwz99jV1K0JWNpGkOvR23PhHPAPacc-oROO0hZFjA0g%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X84ecBBxOfBFNkKt3EKLSUiTCHAkqpv4fV44W30dYx-cw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X84ecBBxOfBFNkKt3EKLSUiTCHAkqpv4fV44W30dYx-cw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533uGXxG_drPabJN48d_7ZgbXxhLKZaxnRrm%2BK3-QuvmaQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: What's wrong with this query?
Look at this example on how to use multiple filters: http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_multiple_filters You should wrap them on a bool filter 2015-03-17 15:32 GMT-03:00 jrkroeg jrkr...@gmail.com: I'm trying to get the top 100 documents which match the filtered criteria, and sort by distance from the pin.location. Here's my query - which isn't resulting in error, but should be returning results: { query: { filtered: { query: { match_all: {} }, filter: [ { term: { searchTerm1: N } }, { term: { searchTerm2: Y } }, { term: { searchTerm3: Y } }, { term: { searchTerm4: Y } } ] } }, sort: [ { _geo_distance: { pin.location: { lat: 34.073620, lon: -118.400356 }, order: asc, unit: mi } } ], size: 100 } On a separate note, I'd like to find a way to make the filter more of a suggestion, rather than forced - how would I achieve this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22379295-332d-4ebe-aef3-6c9b2326e755%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/22379295-332d-4ebe-aef3-6c9b2326e755%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533a5NcTmSnSYBDTJtmPpVk9a1vyiO9TZkYnPqdyP3TwnQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Terms aggregations in docs with nested objects using a lot of memory
We are running ElasticSearch in a cluster with 1 node, 1 index, 6 shards, 55 million docs. We run queries with terms aggregation in 15 fields and it works well, taking about 10 seconds to return. We reindexed the docs in another cluster with 1 node, 1 index, 4 shards and the same 55 million docs to run some tests. The mapping is a little different, now having some nested objects. We run the same queries as before (adapted to use the nested queries and aggregations) but we always get circuit breaker error because loading the fields to the memory for the aggregation would take more memory than available. Both machines have the same configurations (64GB of memory, running ES with ES_HEAP_SIZE=32g) I used the node stats api to get some info about the fielddata (_stats/fielddata?fields=my_fieldpretty) in both machines about a field that didn't have any change in the mapping, existing directly in the root document (not nested), and I got a huge difference in memory usage: *Machine 1:* { _shards : { total : 8, successful : 4, failed : 0 }, _all : { primaries : { fielddata : { memory_size_in_bytes : 28132578552, evictions : 0, fields : { my_field : { memory_size_in_bytes : 224983649 } } } }, total : { fielddata : { memory_size_in_bytes : 28132578552, evictions : 0, fields : { my_field : { memory_size_in_bytes : 224983649 } } } } }, indices : { my_index_1 : { primaries : { fielddata : { memory_size_in_bytes : 28132578552, evictions : 0, fields : { my_field : { memory_size_in_bytes : 224983649 } } } }, total : { fielddata : { memory_size_in_bytes : 28132578552, evictions : 0, fields : { my_field : { memory_size_in_bytes : 224983649 } } } } } } } *Machine 2:* { _shards : { total : 12, successful : 6, failed : 0 }, _all : { primaries : { fielddata : { memory_size_in_bytes : 6812053739, evictions : 0, fields : { my_field : { memory_size_in_bytes : 62533082 } } } }, total : { fielddata : { memory_size_in_bytes : 6812053739, evictions : 0, fields : { my_field : { memory_size_in_bytes : 62533082 } } } } }, indices : { my_index_2 : { primaries : { fielddata : { memory_size_in_bytes : 6812053739, evictions : 0, fields : { my_field : { memory_size_in_bytes : 62533082 } } } }, total : { fielddata : { memory_size_in_bytes : 6812053739, evictions : 0, fields : { my_field : { memory_size_in_bytes : 62533082 } } } } } } } While in the old index the field uses *62.5331MB*, in the new index it uses *224.984MB*. Heavier fields that uses about 1GB in the old index are using 4~6GB in the new index. With the 15 aggregations together, the memory usage increased to a size that won't fit in the heap. Does the fact that the document have nested objects change the amount of memory needed to keep non-nested fields in memory? I tested using include_in_root in every nested object and doing all my aggregation directly in the root doc (not using nested aggregations at all) and still every field uses way more memory than the old index, with the same data. Can someone explain it? I have no clue -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532ue8bNrt3391xadCw9HH_gBCSPy5gPY3ds1hTDmnGL-Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Search within array
I'm searching on an array of objects The problem is when I search using query string http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query, it matches the text split in different objects (different array positions). Is there a way to avoid this behavior and search the query string within the same array position? I know that I could index the field with a high position_offset_gap and search using phrase, but I don't need the text to be in order, only within the same array position -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2531wF4r09FHHcWOLVcU5V4O_p%2BCGEpqGBkto37ic3oe0Pg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Ignore a field in the scoring
Thank you very much 2015-01-08 4:35 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com: Hi, I believe it's intended according to https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html . It says: -- Note that CollectionStatistics.maxDoc() is used instead of IndexReader#numDocs() because also TermStatistics.docFreq() is used, and when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in the same direction. In addition, CollectionStatistics.maxDoc() is more efficient to compute -- Masaru On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Thank you for your explanation Do you know if it is a bug of intended behavior? I don't think deleted (marked as deleted) docs should be used at all 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com: Hi, Update is delete and add. I mean, instead of updating existing document, it deletes it and adds it as new document. And those deleted documents are just marked as deleted and aren’t actually removed from index until the segment merge. IDF doesn’t take those deleted-but-not-removed document into account (it counts those documents). That’s the reason you see different IDF score (you see both maxDocs and docFreq are incremented). Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But when the document is updated (delete + add), it got new ID 0 in new segment. So, I think it’s not possible to keep score when you update documents. You can run optimise with max_num_segments=1 every time you update documents but it’s not practical (and until optimise is done, you see different score) Masaru -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533-8TBoyPmfpqj12T_TVb4z%2BrgLKqtuOxRfReajti7WfA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Ignore a field in the scoring
Now I ran the query with explain = true. The results are the following: *Explain before the update:* details: [ { value: 5.752348, description: fieldWeight in 424, product of:, details: [ { value: 1, description: tf(freq=1.0), with freq of:, details: [ { value: 1, description: termFreq=1.0 } ] }, { value: 9.203756, description: idf(docFreq=201, maxDocs=738240) }, { value: 0.625, description: fieldNorm(doc=424) } ] } ] *Update script (scriptLang = groovy, profileId = 1):* if (ctx._source.bookmarked_by == null) { ctx._source.bookmarked_by = [profileId] } else if (ctx._source.bookmarked_by.contains(profileId)) { ctx.op = none } else { ctx._source.bookmarked_by += profileId } *Explain after the update:* details: [ { value: 5.749262, description: fieldWeight in 0, product of:, details: [ { value: 1, description: tf(freq=1.0), with freq of:, details: [ { value: 1, description: termFreq=1.0 } ] }, { value: 9.198819, description: idf(docFreq=202, maxDocs=738241) }, { value: 0.625, description: fieldNorm(doc=0) } ] } ] * Query used with the explain:* { query: { query_string: { fields: [ name ], query: roger } } } The inverse document frequency (idf) is changed after adding a new field that is not used in the query. Also, it changed the fieldWeight in 424 and fieldNorm(doc=424) to fieldWeight in 0 and fieldNorm(doc=0) (idk if it changes something) Can someone help me on how to not change the score of the document after running the update? Note that the update creates a new field if it was not found (== null), but this field is not used in the query 2015-01-05 13:35 GMT-02:00 Roger de Cordova Farias roger.far...@fontec.inf.br: The added field is an array of Integers, but we are not using it in the query at all We are not querying the _all field, it is disabled in our type mapping Our query is something like this: { query: { query_string: { fields: [ name ], query: roger } } } I ran this query. In the first result, I added a new field called bookmarked_by with a numeric value. Then I ran the same query again. The document in which I added the new field is no longer the first result 2014-12-26 17:34 GMT-02:00 Doug Turnbull dturnb...@opensourceconnections.com: Are you querying the _all field? How are you doing your searches? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html The _all field receives a copy of every field you index, so adding data here could impact scores regardless of the source field. Otherwise, fields are scored independently before being put together by other queries like boolean queries or dismax. Are you using boolean/dismax/etc over multiple fields? -Doug On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote: Use the field in a filter and not part of the query. Is this field free text? Ivan On Dec 23, 2014 9:12 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p
Re: Ignore a field in the scoring
The added field is an array of Integers, but we are not using it in the query at all We are not querying the _all field, it is disabled in our type mapping Our query is something like this: { query: { query_string: { fields: [ name ], query: roger } } } I ran this query. In the first result, I added a new field called bookmarked_by with a numeric value. Then I ran the same query again. The document in which I added the new field is no longer the first result 2014-12-26 17:34 GMT-02:00 Doug Turnbull dturnb...@opensourceconnections.com: Are you querying the _all field? How are you doing your searches? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html The _all field receives a copy of every field you index, so adding data here could impact scores regardless of the source field. Otherwise, fields are scored independently before being put together by other queries like boolean queries or dismax. Are you using boolean/dismax/etc over multiple fields? -Doug On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote: Use the field in a filter and not part of the query. Is this field free text? Ivan On Dec 23, 2014 9:12 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533UjpAz2dvNitdD-%3DaoXL9rrkZdd%3DzC3LZz8xWYvBAoFQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Ignore a field in the scoring
Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How to use json in update script
Hello I'm trying to update a document whose root object contains a list of nested objects. I need to send an object of the nested type as a script parameter to append to the list How can I append the json (a string type) to the nested objects list of the root object using Groovy? or should I use another script lang? I tried using JsonSlurper http://groovy-lang.org/json.html in Groovy, that converts between json and Groovy objects, but I always get: Caused by: org.elasticsearch.script.groovy.GroovyScriptCompilationException: MultipleCompilationErrorsException[startup failed: Script3.groovy: 2: unable to resolve class JsonSlurper @ line 2, column 19. def jsonSlurper = new JsonSlurper(); ^ 1 error ] at org.elasticsearch.script.groovy.GroovyScriptEngineService.compile(GroovyScriptEngineService.java:117) at org.elasticsearch.script.ScriptService.getCompiledScript(ScriptService.java:368) at org.elasticsearch.script.ScriptService.compile(ScriptService.java:354) at org.elasticsearch.script.ScriptService.executable(ScriptService.java:497) at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:149) ... 8 more -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2531Qm2GZbvM7CMZSd8sqjUF-VQ%3DN6YUKQam5EOPd9pBvRA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How to use json in update script
Ok, I found out that I can send a JSON as a script parameter and just append it to the nested objects list (with list += newObject or list.add(newObject) ) using groovy and it works But it is not working with the Java API, I can only get it to work using the REST API. When using Java the JSON is treated as a string, then I get the error: object mapping [objectsList] trying to serialize a value with no field associated with it, current value [{field:value}] I can reproduce the error in the REST API by wrapping the JSON parameter with quotes: *Works (using REST API):* { script: ctx._source.objectsList += newObject, params: { newObject: {field: value} }, lang: groovy } *Does not work (using REST API):* { script: ctx._source.objectsList += newObject, params: { newObject: {\field\: \value\} }, lang: groovy } *Does not work (using JAVA API):* String script = ctx._source.objectsList += newObject; 2014-12-16 13:04 GMT-02:00 Roger de Cordova Farias roger.far...@fontec.inf.br: Hello I'm trying to update a document whose root object contains a list of nested objects. I need to send an object of the nested type as a script parameter to append to the list How can I append the json (a string type) to the nested objects list of the root object using Groovy? or should I use another script lang? I tried using JsonSlurper http://groovy-lang.org/json.html in Groovy, that converts between json and Groovy objects, but I always get: Caused by: org.elasticsearch.script.groovy.GroovyScriptCompilationException: MultipleCompilationErrorsException[startup failed: Script3.groovy: 2: unable to resolve class JsonSlurper @ line 2, column 19. def jsonSlurper = new JsonSlurper(); ^ 1 error ] at org.elasticsearch.script.groovy.GroovyScriptEngineService.compile(GroovyScriptEngineService.java:117) at org.elasticsearch.script.ScriptService.getCompiledScript(ScriptService.java:368) at org.elasticsearch.script.ScriptService.compile(ScriptService.java:354) at org.elasticsearch.script.ScriptService.executable(ScriptService.java:497) at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:149) ... 8 more -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2531fsDup%2B0%3DtSR48ugsVkphLG%2B1s4QbOjLP7GjrMncBbTA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How to use json in update script
*Does not work (using JAVA API):* String script = ctx._source.objectsList += newObject; UpdateRequestBuilder prepareUpdate = client.prepareUpdate(indexName, typeName, id); prepareUpdate.setScriptLang(groovy); prepareUpdate.setScript(script, ScriptType.INLINE); prepareUpdate.addScriptParam(newObject, {\status\:\aasdsd\}); prepareUpdate.get(); Is there a way to reproduce the working REST API behavior with the Java API? 2014-12-16 15:17 GMT-02:00 Roger de Cordova Farias roger.far...@fontec.inf.br: Ok, I found out that I can send a JSON as a script parameter and just append it to the nested objects list (with list += newObject or list.add(newObject) ) using groovy and it works But it is not working with the Java API, I can only get it to work using the REST API. When using Java the JSON is treated as a string, then I get the error: object mapping [objectsList] trying to serialize a value with no field associated with it, current value [{field:value}] I can reproduce the error in the REST API by wrapping the JSON parameter with quotes: *Works (using REST API):* { script: ctx._source.objectsList += newObject, params: { newObject: {field: value} }, lang: groovy } *Does not work (using REST API):* { script: ctx._source.objectsList += newObject, params: { newObject: {\field\: \value\} }, lang: groovy } *Does not work (using JAVA API):* String script = ctx._source.objectsList += newObject; 2014-12-16 13:04 GMT-02:00 Roger de Cordova Farias roger.far...@fontec.inf.br: Hello I'm trying to update a document whose root object contains a list of nested objects. I need to send an object of the nested type as a script parameter to append to the list How can I append the json (a string type) to the nested objects list of the root object using Groovy? or should I use another script lang? I tried using JsonSlurper http://groovy-lang.org/json.html in Groovy, that converts between json and Groovy objects, but I always get: Caused by: org.elasticsearch.script.groovy.GroovyScriptCompilationException: MultipleCompilationErrorsException[startup failed: Script3.groovy: 2: unable to resolve class JsonSlurper @ line 2, column 19. def jsonSlurper = new JsonSlurper(); ^ 1 error ] at org.elasticsearch.script.groovy.GroovyScriptEngineService.compile(GroovyScriptEngineService.java:117) at org.elasticsearch.script.ScriptService.getCompiledScript(ScriptService.java:368) at org.elasticsearch.script.ScriptService.compile(ScriptService.java:354) at org.elasticsearch.script.ScriptService.executable(ScriptService.java:497) at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:149) ... 8 more -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2530t5o9jcpgGRsJo1zV%2BaSvD7Uk8QyTKha6VR-RoHQuqsQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Unique values on the matching docs
Hello I have a query with a from/size, and I need to get the unique values of a specific field of the returned docs only. I could do it in the client side, but it would help if ElasticSearch could do it for me The Terms Aggregation helps getting the unique values, but it ignores the from/size of the query. Is there a way to run the Terms Aggregation in the results only, or is there another way of getting unique values on the search result? Thanks in advance Roger -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532diRjvrA-GcM1nNETE7Cuc26HoMBCY6mSJ_nYs7_oH%2Bg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Advices on migrating 1.3.2 to 1.4.1
Thank you for the advice 2014-12-04 9:30 GMT-02:00 Elvar Böðvarsson elv...@gmail.com: I upgraded our logging cluster to 1.4 without any problems. When I looked into upgrading a separate dev/test instance used for a different purpose I ran into problems with the plugins. If you are using plugins, make sure they are supported in 1.4. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1873d1cb-6f49-413d-8157-1220b64411e0%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1873d1cb-6f49-413d-8157-1220b64411e0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533%3D%3Dr1d__%2BKgQr%2Ba66rQ%3Df4WEgMNwphAY0hu06APomqeA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Advices on bookmarking docs
We have a lot of docs like this: { _type: doc, _id: 123, _source: { parent_name: abc } } Each doc has only one parent_name but multiple docs can have the same parent. It is like a many-to-one relationship, but the parent has no other info apart of its name, so we didn't create a separate doc for them Now we want to provide our users the option to bookmark parents so he can later do queries on docs that are children of his bookmarked parents only. We could easily do that with a terms filter like that: { filter: { terms: { parent_name: [ abc, def, ghi ] } } } We could pass to the filter all the user's bookmarked parents names that are persisted, let's say, in a relational database. But the problem is that we have more than 50 million docs and the user can bookmark millions of parents. That would be too heavy to send a filter with millions of terms in every request. So we need to handle the bookmarks directly on ElasticSearch. We considered using a filtered alias, so that we have that very same filter persisted in the Elastic and we won't have to pass it in every request. This would be already way better than passing the filter in each request, but we want more, we want it to be very performatic. Filtering with millions of terms would be slow, even if we don't need to send the filter in the request Now we decided to add in our docs a meta field with information like who bookmarked me, somethink like this: { _type: doc, _id: 123, _source: { parent_name: abc, bookmarked_by: [ roger, john ] } } Then we can use a term (term, without the s) filter like this: { filter: { term: { bookmarked_by: roger } } } That would be (I hope) way more performatic than our last approach, but still has issues. The problem we would have now is about updating bookmarks. When the user bookmarks/un-bookmarks a parent, we can do a query for all docs with this parent and update their bookmarked_by field with the user identifier. That is ok. But what happens when we add a new doc with a parent the user bookmarked before? We could query for the other docs with the same parent and copy the bookmarked_by field to the new doc, but that is ugly. So we concluded we need to have the bookmarked_by field centralized in a parent doc. We considered the following approaches: *1 - parent-child relationship* { _type: parent, _id: 1, _source: { bookmarked_by: [ roger, john ] } } { _type: child, _id: 1, _parent: 1, _source: {} } { _type: child, _id: 2, _parent: 1, _source: {} } Then, when user roger does a query on the children, the query would also have a has_parent filter like this: { has_parent: { parent_type: parent, filter: { term: { bookmarked_by: roger } } } } *2 - nested type* { _type: parent, _id: 1, _source: { bookmarked_by: [ roger, john ], children: [ { id: 1 }, { id: 2 } ] } } Then, when user roger does a query, we use a nested query to query only the children with bookmarked parents: { nested: { path: children, query: { actual_query filter: { has_parent: { parent_type: parent, filter: { term: { bookmarked_by: roger } } } } } } } *3 - No actual joins approach* { _type: parent, _id: 1, _source: { name: abc, bookmarked_by: [ roger, john ] } } { _type: child, _id: 1, _source: { parent_name: abc, bookmarked_by: [ roger, john ] } } { _type: child, _id: 2, _source: { parent_name: abc, bookmarked_by: [ roger, john ] } } Then, every time a parent gets updated, we query for all its children (using the parent_name field) and update their bookmarked_by fields to reflect the updated parent's bookmarked_by field. And every time we add a new child doc we query for its parent and copy the parent's bookmarked_by field to the new doc The main problem with the first 2 approaches is the need to do join in runtime. I didn't test them, but I think that joining with millions of docs could be way slower than not joining at all. Also, the nested type approach has the issue of returning the parent doc on queries, and we need to return the matching children only. The third approach looks to be the more performatic one, but it is almost as ugly as not having the parent in a separate doc at all. I may have put some wrong information here, as I didn't test every approach. I'm only using common knowledge with some guessing, but I hope I have described our problems well I would like some advice, maybe I missed a better approach? Thanks in advance -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe
Re: Advices on migrating 1.3.2 to 1.4.1
Thank you for your response Looks like I read it wrong in the documentation, only the Fields referred to in alias filters must exist in the mappings of the index/indices pointed to by the alias. part was included in the 1.4.0.beta1 Anyway, I found the terms lookup mechanism http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism that also solves our problem of sending a big filter in every request. What we are doing is that the user, after doing a search, can bookmark the results. Then he has the possibility to do new searchs on his bookmarked docs only. Adding metadata to our docs with information of who bookmarked them would work, too... it only will be harder to update, because the user can bookmark/un-bookmark them on the flow and in batches (like bookmark all docs of the search result) I will study the approaches to see wich one fits better for us Thank you very much 2014-12-03 11:46 GMT-02:00 Adrien Grand adrien.gr...@elasticsearch.com: Hi, 1.4 changed a lot of things, especially at the distributed system level, so testing it in your staging environment will certainly help ensure that things work as expected. Filtered aliases have been available for a long time (even before 1.4.0.beta1), it's very likely that they are already available with the current version that you are running. However, a filter containing 10 million of ids will be slow anyway, even if you cache them the first execution on a new segment might cause latency spikes since there are lots of postings lists that need to be merged. Would it be possible to change it to a simpler term filter, eg. by adding more metadata to your documents? On Mon, Dec 1, 2014 at 9:23 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello We currently have a cluster with 50 millions of docs using ElasticSearch version 1.3.2 We were looking for something like a persisted filter, and the filtered aliases, added in version 1.4.0.beta1, seems perfect for it Our infrastructure team is not happy to upgrading it in production without doing a lot of tests before, so we have to do a lot of tests and upgrade later We are looking for some advices in what can go wrong with this upgrade, what are the risks? And also, is there a way to implement a persistent filter in our current version? I mean, some of our users will have access to a part of our data, we need something like a database view. We could send a filter in every request, but that would be too slow with, let's say, 10 millions of ids. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533WXXhA4hAd4qBPWa0ZUZGBPUFQ0V4Tv_u7p2OuyCChoA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2533WXXhA4hAd4qBPWa0ZUZGBPUFQ0V4Tv_u7p2OuyCChoA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6LV9v5uzeaiKwTXuZMhihhTrhKwmS0cPVmsLfGfLKYjw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6LV9v5uzeaiKwTXuZMhihhTrhKwmS0cPVmsLfGfLKYjw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533GRyUM4vwWcLKkOAuGLQqc13%3D1w8T%2BDHZWFEN7CGs-Gg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Trouble formulating a query with Java API
You can use the toString() method of the SearchRequestBuilder to see the generated query. With your example it was: { size : 10, query : { multi_match : { query : searchterm, fields : [ FIELD1.not_analyzed, FIELD2.partial ] } }, sort : [ { SCORE : { order : desc } } ] } This query looks ok. Are you not receiving any results? Not even the total value? 2014-11-28 16:11 GMT-02:00 Maarten Roosendaal mroosendaa...@gmail.com: Hi, I have the following (json) query i use: { fields: [ID,ID2], query: { filtered: { query: { multi_match: { query: searchterm, fields: [FIELD1.not_analyzed,FIELD2.partial] } }, filter: { bool: { must: [ { term : { FIELD2: No } }, { term : {FIELD3: Yes } } ] } } } }, sort: [ { SCORE: { order: desc } } ] } but my attempts a building the same query with the Java API hasn't been fruitful.The goal is to search for a match based on a searchterm in several fields and some fields are more important than others. I know the basic setup: .client .prepareSearch() .setSearchType(SearchType.QUERY_AND_FETCH) .setQuery(QueryBuilders.multiMatchQuery(searchterm, FIELD1.not_analyzed, FIELD2.partial)) .addSort(SCORE, SortOrder.DESC) .setSize(10) .execute() .actionGet(); but there are no results while the json query has a returnvalue. So 2 questions: 1) i could use some help formulating the right Java query 2) why is the json return stuff and the java query not? Thanks, Maarten -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/166a43b2-d92c-4307-abed-2ce95ae87fb2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/166a43b2-d92c-4307-abed-2ce95ae87fb2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532Z8bataicKD%2BrUECd4yoJkXZd47MMVssgoE%3DK8opXhCg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Advices on migrating 1.3.2 to 1.4.1
Hello We currently have a cluster with 50 millions of docs using ElasticSearch version 1.3.2 We were looking for something like a persisted filter, and the filtered aliases, added in version 1.4.0.beta1, seems perfect for it Our infrastructure team is not happy to upgrading it in production without doing a lot of tests before, so we have to do a lot of tests and upgrade later We are looking for some advices in what can go wrong with this upgrade, what are the risks? And also, is there a way to implement a persistent filter in our current version? I mean, some of our users will have access to a part of our data, we need something like a database view. We could send a filter in every request, but that would be too slow with, let's say, 10 millions of ids. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533WXXhA4hAd4qBPWa0ZUZGBPUFQ0V4Tv_u7p2OuyCChoA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggrigation with a whole string as key.
You have to index it as a single token. You can have the same string indexed twice using multi fields: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields Then you can index the string not analyzed (as in the multi fields page's example) or using keyword tokenizer if you need the field analyzed: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-tokenizer.html#analysis-keyword-tokenizer 2014-11-18 11:46 GMT-02:00 Jörgen Lundberg jorgen.lundb...@gmail.com: Hi all, I asked this question at Stack Overflow last week. http://stackoverflow.com/questions/26909312/is-it-possible-to-aggregate-over-a-whole-string-in-a-logstash-query In Kibana I'm trying to aggregate the top errors in our log by aggregating over a term we call LogMessage. This works well except that the aggregation counts the number each word in the LogMessage appears. Is it possible to aggregate over a whole string, or am I thinking about this the wrong way? /Jörgen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b9d4ebf-1cf1-4204-8b70-739539552d23%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4b9d4ebf-1cf1-4204-8b70-739539552d23%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Gb5jQQLGmVAPykS4mYx9X7ewMVOD31awDYSU5NHBQKw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Advices on mapping field to huge text value
Hello I have to create a mapping to a type that will have a text field with values: - that are huge (more than 32KB), - that are very bad structured, and will have snippets like elas tic search and I need to find it when the user searches for elasticsearch or elastic search I can't modify the source of the text, it is extracted from a pdf file, so I have to handle the issues in the type mapping Can someone give me some advices on how to map this field? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa5f5189-63d9-4d83-a028-da584e12f1d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Resume scroll-scan query?
I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan request to retrieve all docs, but my reindexer program stopped at 30m Is there a way to redo the query to retrieve the left docs? Like using offset? Would the the internal order of the scan query be the same with a second request? I can assure that no new docs were indexed in the old index since the beginning of the reindexing -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69ec11c9-774e-42df-be57-fd870d347743%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Resume scroll-scan query?
Hmm, I was using a small ttl, just enough to process each scroll call, but I could try using a longer time to live and resuming from the last scroll_id in case of error That is a good idea, thanks 2014-10-23 17:12 GMT-02:00 John Smith java.dev@gmail.com: The scroll is available based on a timeout value you give it. Everytimetime you scroll you restart the countdown. You could track the last scroll id you used and try it again from there? On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias wrote: I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan request to retrieve all docs, but my reindexer program stopped at 30m Is there a way to redo the query to retrieve the left docs? Like using offset? Would the the internal order of the scan query be the same with a second request? I can assure that no new docs were indexed in the old index since the beginning of the reindexing -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532aT3fhY6axy%3DRwCG3Ukh9ivP1fmqoUs3pJa65e8oAs6A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Resume scroll-scan query?
I know it resets the ttl on each scroll call, but since I don't have an automatic resuming process, I need to manually check the last scroll_id (I will log it to a file) and restart the reindexing program using it. That is why I need a longer ttl I just tested the re-use of the scroll_id. Looks like after the first request, the same scroll_id is returned over and over, returning new docs. So I can't use this approach, since I will always lose the last batch after resuming the reindexing 2014-10-23 18:20 GMT-02:00 John Smith java.dev@gmail.com: Small ttl is ok (well adjusted properly for you process) because everytime you call scroll it resets the ttl. So you don't need to put a 60m scroll time. It just has to be long enough to be able to process the next scroll id. I'm curious if you can re-use the scroll id. It's not specifically mentioned in the docs but i think scroll is forward only. So not sure once you got once scroll id you can go back to it. I guess one way to find out :) On Thursday, 23 October 2014 15:44:04 UTC-4, Roger de Cordova Farias wrote: Hmm, I was using a small ttl, just enough to process each scroll call, but I could try using a longer time to live and resuming from the last scroll_id in case of error That is a good idea, thanks 2014-10-23 17:12 GMT-02:00 John Smith java.d...@gmail.com: The scroll is available based on a timeout value you give it. Everytimetime you scroll you restart the countdown. You could track the last scroll id you used and try it again from there? On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias wrote: I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan request to retrieve all docs, but my reindexer program stopped at 30m Is there a way to redo the query to retrieve the left docs? Like using offset? Would the the internal order of the scan query be the same with a second request? I can assure that no new docs were indexed in the old index since the beginning of the reindexing -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/elasticsearch/NbshHCrBHoM/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532ZuYCP6%3DxqJeUmZGowAo9dzY%2BQGZkHKbfkTyCWxODF5w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.