Re: access parent bucket's key from child aggregation in geohash grid
Hi, Unfortunately, accessing the parent bucket key is not possible. On Fri, May 2, 2014 at 12:04 AM, Thomas Gruner tom.gru...@gmail.com wrote: Hello! I have been progressing well with aggregations, but this one has got me stumped. I'm trying to figure out how to access the key of the parent bucket from a child aggregation. The parent bucket is geohash_grid, and the child aggregation is avg (trying to get avg lat and lon, but only for points that match the parent's bucket's geohash key) Something like this: aggregations : { LocationsGrid: { geohash_grid : { field : Locations, precision : 7, }, aggregations : { avg_lat: { avg: { script: if (doc['Locations'].value.geohash.startsWith(*parent_bucket.key*)) doc['Locations'].value.lat; } } }, } } Thanks for any help or ideas with this! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/624d0bdd-c380-4c72-b642-e6afff3458a9%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/624d0bdd-c380-4c72-b642-e6afff3458a9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4fxXs4eQeh7SJ5KQyVbPuqd2bLKfWRmdRDsc%3DiCwkgdA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
stats, extended stats, percentiles for doc_count in aggregations
Is it possible to get stats, extended stats, or percentiles across the doc_counts in each bucket of an aggregation? I see how to use it on an existing numeric field value (e.g., height, grade), but I want to see the average bucket size, stddev, or other stats on how one doc_count compares to doc_counts in the other buckets. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f847621-e92c-4bdf-915f-60bd799071ee%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Read/Write consistency
Hi Mohit, I think the transaction log takes care of that, because there's a copy on all instances of the same shard, and they need to be in sync. Best regards, Radu -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, May 1, 2014 at 9:57 PM, Mohit Anchlia mohitanch...@gmail.comwrote: What's not clear is how does elasticsearch identify what pieces of data is missing between the primary and the replica? On Wed, Apr 30, 2014 at 3:27 AM, Radu Gheorghe radu.gheor...@sematext.com wrote: Hi Mohit, I'll answer inline. On Mon, Apr 28, 2014 at 4:57 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Trying to understand the following scenarios of consistency in elasticsearch: 1) sync replication - How does elasticsearch deals with consistency issue that may arise from 1 node momentarily going down and missing writes to it? This depends on the write consistency setting. By default, the operation only succeeds if a quorum of replicas can index the document: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency When the node comes backup and the reads going to the non-primary shards could get inconsistent data? No, when the node comes back up it will sync the stuff it missed with the other nodes. 2) async replication - What happens if replication is slow for some reason, could users see inconsistent data? Yes, if you hit a shard that didn't get the latest operation, it could see an old version of the data. You can use preference to try and hit the primary shard all the time, but then your replicas will just be sitting there for redundancy: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html 3) sync/async replication - how does elasticsearch keep data in sync for those writes that never happened on the non-primary shard because of network/node failures? It either uses the transaction log or it transfers the whole shard to that node. Best regards, Radu -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3aJ4qZt47uyjqs0gd6L1Fz0EhLrV_L7jzSFAYOEvz1Nw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHXA0_3aJ4qZt47uyjqs0gd6L1Fz0EhLrV_L7jzSFAYOEvz1Nw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpdBxiXZgDw5HXdeRPr5oJtnwHTwHNFr2_UoJYobPqzxw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAOT3TWpdBxiXZgDw5HXdeRPr5oJtnwHTwHNFr2_UoJYobPqzxw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3FAEvQGjMWDqSCT6biYJGiMNGSUDJ80QvT1cJXnqtNJg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: stats, extended stats, percentiles for doc_count in aggregations
Hi, There is currently no way to do that but I think this could be done on client side? On Fri, May 2, 2014 at 8:56 AM, Loren lo...@siebert.org wrote: Is it possible to get stats, extended stats, or percentiles across the doc_counts in each bucket of an aggregation? I see how to use it on an existing numeric field value (e.g., height, grade), but I want to see the average bucket size, stddev, or other stats on how one doc_count compares to doc_counts in the other buckets. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f847621-e92c-4bdf-915f-60bd799071ee%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8f847621-e92c-4bdf-915f-60bd799071ee%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5nYVvcxUPAtH06Sbw7oayGPNF6x%2BLEkKmWkwtgHqqLKA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Help with ES 1.x percolator query plz
Hi, Can you share the stored percolator queries and the percolate request that you were initially trying with, but didn't work?\ Martijn On 2 May 2014 11:14, JGL j.g.liu...@gmail.com wrote: Can anybody help plz? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ee60836-1922-43e0-8d9b-64ef9bb0b00a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ee60836-1922-43e0-8d9b-64ef9bb0b00a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TxcY%2BTB%2Btpg6C2Ujei5uHc3xXN67rGduEa4gR1c_PyNtg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: strange problem: my ES server almost lost all its data. (All shards failed)
On FreeBSD, do you have multicast on IPv6 enabled? You should disable IPv6 on the JVM. Seems you received a severe network error from the OS. Jörg On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, I'm running a small server with logstash, ES, Kibana. Tonight, I've restarted my ES process. Very bad idea: it restarted with lots of errors, and finally lost all its data. Basically, before restart, I've had: elasticsearch/nodes/0/indices/logstash-2014.* elasticsearch/nodes/0/_state/ after restart, I've had: elasticsearch/nodes/0/indices/logstash-2014.* elasticsearch/nodes/0/_state/ elasticsearch/nodes/1/indices/logstash-2014.05.01 elasticsearch/nodes/1/_state/ Then, Kibana was not able to find anything (dashboards lost, etc.). I've stopped Logstash, stopped Elasticsearch, waited a bit and checked everything is down, then restarted ES. It looked OK, then I've restarted Logstash, and I was able to access my dashboards again. I've just lost 15 minutes of data. Now I can see that elasticsearch/nodes/0 is the current working directory, and I can browse old data and current data. elasticsearch/nodes/1 is not used anymore. I'm running FreeBSD, and used the service command to restart ES. When attempting the second shutdown, the script wouldn't find the pid file, so I've had to kill the Java process. I don't understand what happened. But I don't feel comfortable putting ES in production. Full log for first and second restart here: http://patpro.net/elastic.log Any idea? Regards, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DEC08780-FC7C-44F7-B7B8-B70215060351%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2Dvb2RTGdyukXOKS1DYGsnDTNQLnzCX%2Ba%2Bx%2B-KiuXjQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: strange problem: my ES server almost lost all its data. (All shards failed)
Hi Jörg, Thank you for your reply. The service script includes an option that might deal with IPv6, but it's not active: # Force the JVM to use IPv4 stack # elasticshearch_props-Djava.net.preferIPv4Stack=true (http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955) In past years, I used to disable IPv6 everywhere (kernel, ports compilation, etc.) but now I don't bother anymore. Do you mean I should use this option to force IPv4? Thanks, Patrick On 2 mai 2014, at 09:38, joergpra...@gmail.com wrote: On FreeBSD, do you have multicast on IPv6 enabled? You should disable IPv6 on the JVM. Seems you received a severe network error from the OS. Jörg On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, I'm running a small server with logstash, ES, Kibana. Tonight, I've restarted my ES process. Very bad idea: it restarted with lots of errors, and finally lost all its data. Basically, before restart, I've had: elasticsearch/nodes/0/indices/logstash-2014.* elasticsearch/nodes/0/_state/ after restart, I've had: elasticsearch/nodes/0/indices/logstash-2014.* elasticsearch/nodes/0/_state/ elasticsearch/nodes/1/indices/logstash-2014.05.01 elasticsearch/nodes/1/_state/ Then, Kibana was not able to find anything (dashboards lost, etc.). I've stopped Logstash, stopped Elasticsearch, waited a bit and checked everything is down, then restarted ES. It looked OK, then I've restarted Logstash, and I was able to access my dashboards again. I've just lost 15 minutes of data. Now I can see that elasticsearch/nodes/0 is the current working directory, and I can browse old data and current data. elasticsearch/nodes/1 is not used anymore. I'm running FreeBSD, and used the service command to restart ES. When attempting the second shutdown, the script wouldn't find the pid file, so I've had to kill the Java process. I don't understand what happened. But I don't feel comfortable putting ES in production. Full log for first and second restart here: http://patpro.net/elastic.log Any idea? Regards, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AA0BC7BA-8856-4A23-A172-0601BC0B4FEE%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: How to write a custom river
Hi Joshua, the package is not an issue if you are using the default one for your classes. Looking deeper, the type of the river that you try to register with your rest call doesn't match the type of the river you registered in the plugin when you did module.registerRiver(type, riverclass). Cheers Luca On Friday, May 2, 2014 6:08:28 AM UTC+2, Rob Ottaway wrote: I should have sent you the following earlier rather than a non-river plugin: the plugin: https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/plugin/river/rabbitmq/RabbitmqRiverPlugin.java The river implementation: https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiver.java The module: https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiverModule.java Looks like you are registering the river implementation rather than the river module hence the not working. Had to look at an example I know works to figure it out. -Rob On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.bennett.c...@gmail.com wrote: So, that's what I did, but no love... I checked in the latest. -Josh On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote: Look at this plugin for help: https://github.com/elasticsearch/elasticsearch- cloud-aws/blob/master/src/main/resources/es-plugin.properties Yes it needs to be the FQN. On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote: Thanks Rob. Someone else also told me the plugin property should be the fully qualified name. I didn't declare a package, so I guess I'm using the default package, and I thought I had the namespacing right since IntelliJ corrected the class name when I wrote it. Thoughts? -Josh On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote: Look at this file in your BB repo: https://bitbucket.org/futurechan/example-river/src/ fd23648c3e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/ es-plugin.properties?at=master cheers On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote: This strikes me as odd: java.lang.ClassNotFoundException: example_river Assume you didn't map the string example_river to the actual class name properly? -Rob On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote: I'm making my first go at writing a river. (Here's the source code: https://bitbucket.org/futurechan/example-river/src) I followed this tutorial http://blog.trifork.com/2013/01/10/how-to-write-an- elasticsearch-river-plugin/ and compared it to this existing river https://github.com/jprante/elasticsearch-river-jdbc but I haven't had much luck. To deploy the river, I created a folder called example-river under plugins, dropped my jar in that folder, and restarted the node. Everything starts up fine. I have also tried bin/plugin --url file:///path/to/plugin --install example-river, which seems to work, but it unpacks my jar. So, I tried zipping it first and then installing, which works and does not unpack my jar, but it didn't help. When I issue this PUT request: http://localhost:9200/_river/example_river/_meta { type: example_river, example_river:{ blah:blah } } I get this exception: [2014-04-20 22:28:46,538][DEBUG][river ] [Gloom] creating river [example_river][example_river] [2014-04-20 22:28:46,543][WARN ][river ] [Gloom] failed to create river [example_river][example_river] org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [example_river] at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87) at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58) at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44) at org.elasticsearch.river.RiversService.createRiver(RiversService.java:137) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269) at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.jav a:93) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: example_river at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at
Re: checkIndex with ES 1.1.0
Here I go replying to my own questions again. ES spreads Lucene files over all data paths and that means that checkIndex doesn't work as it expects all files to be in one directory. What I did was to create a directory and then make symlinks to all the files across all data paths and then run checkIndex. This works as expected. On Wednesday, 9 April 2014 11:41:24 UTC+2, Michael Salmon wrote: I recently had a problem with an index and after searching the net I decided to give checkIndex a try. I found the class in the right jar but I haven't been able to get it to check an index. For example when I run checkIndex -verbose ...heat-analyzer/7/index I get: ERROR: could not read any segments file in directory java.io.FileNotFoundException: ...heat-analyzer/7/index/segments_37l (No such file or directory) That is correct, the directory contains: _checksums-1397035754356 _j85h_es090_0.blm _j8fu.cfe _j91a.cfs _isy5.nvm _j86v.cfe _j8g4.cfs _j95w_es090_0.tim _j821.si _j870.cfs _j8hl.si _ j963.si _j82l.cfs _j898_Lucene45_0.dvd _j8ln.cfs _j968.cfs _j838.cfe _j8ap.cfe _j8t6.cfe _ j975.si _j83a.fdx _j8b6.cfs _j8tb_es090_0.doc _j9b9.cfs _j83o.nvd _j8c1.si _j8tp.cfs _j9di_es090_0.blm _j83o.si _j8cz.si _j8v9.fnm _j9e0.cfe _j83p.nvd _j8et.cfs _j8yv.cfe segments.gen _j84u.cfs _j8fa.cfs _j919.fdt write.lock I am running ES 1.1.0 and using the checkIndex from Lucene 4.7.0. Has anyone gotten checkIndex to work with this combination? /Michael -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9093249d-b657-4da7-9804-a59afb4f698d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation bug? Or user error?
I havent been able to figure out what is required to recreate it. I am doing a number of identical aggregations (just different values intentMarketCode and intentDate Three aggregations give correct numbers - one doesnt I havent figured why On Wednesday, 30 April 2014 14:13:00 UTC+1, Adrien Grand wrote: This looks wrong indeed. By any chance, would you have a curl recreation of this issue? On Tue, Apr 29, 2014 at 7:35 PM, mooky nick.mi...@gmail.com javascript: wrote: It looks like a bug to me - but if its user error, then obviously I can fix it a lot quicker :) On Tuesday, 29 April 2014 13:04:53 UTC+1, mooky wrote: I am seeing some very odd aggregation results - where the sum of the sub-aggregations is more than the parent bucket. Results: CSSX : { doc_count : *24*, intentDate : { buckets : [ { key : Overdue, to : 1.3981248E12, to_as_string : 2014-04-22, doc_count : *1*, ME : { doc_count : *0* }, NOT_ME : { doc_count : *24* } }, { key : May, from : 1.3981248E12, from_as_string : 2014-04-22, to : 1.4006304E12, to_as_string : 2014-05-21, doc_count : *23*, ME : { doc_count : 0 }, NOT_ME : { doc_count : *24* } }, { key : June, from : 1.4006304E12, from_as_string : 2014-05-21, to : 1.4033088E12, to_as_string : 2014-06-21, doc_count : *0*, ME : { doc_count : *0* }, NOT_ME : { doc_count : *24* } } ] } }, I wouldn't have thought that to be possible at all. Here is the request that generated the dodgy results. CSSX : { filter : { and : { filters : [ { type : { value : inventory } }, { term : { isAllocated : false } }, { term : { intentMarketCode : CSSX } }, { terms : { groupCompanyId : [ 0D13EF2D0E114D43BFE362F5024D8873, 0D593DE0CFBE49BEA3BF5AD7CD965782, 1E9C36CC45C64FCAACDEE0AF4FB91FBA, 33A946DC2B0E494EB371993D345F52E4, 6471AA50DFCF4192B8DD1C2E72A032C7, 9FB2FFDC0FF0797FE04014AC6F0616B6, 9FB2FFDC0FF1797FE04014AC6F0616B6, 9FB2FFDC0FF2797FE04014AC6F0616B6, 9FB2FFDC0FF3797FE04014AC6F0616B6, 9FB2FFDC0FF5797FE04014AC6F0616B6, 9FB2FFDC0FF6797FE04014AC6F0616B6, AFE0FED33F06AFB6E04015AC5E060AA3 ] } }, { not : { filter : { terms : { status : [ Cancelled, Completed ] } } } } ] } }, aggregations : { intentDate : { date_range : { field : intentDate, ranges : [ { key : Overdue, to : 2014-04-22 }, { key : May, from : 2014-04-22, to : 2014-05-21 }, { key : June, from : 2014-05-21, to : 2014-06-21 } ] }, aggregations : { ME : { filter : { term : { trafficOperatorSid : S-1-5-21-20xxspan style=color: #000; class=styled-by ... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e7d8928-f76b-4358-97b9-3189e037006c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Problem while searching for date range or date
Found the solution to same problem here - https://groups.google.com/forum/#!searchin/elasticsearch/date/elasticsearch/eeTwWVf6Sfo/1jbHq0gca6QJ Thanks. On Friday, May 2, 2014 3:13:30 PM UTC+5:30, Hemant wrote: Hello, I have indexed some data, with default mapping - 1. { 2. inventory: { 3. products: { 4. properties: { 5. exp_date: { 6. type: date, 7. format: dateOptionalTime 8. }, 9. man_date: { 10. type: date, 11. format: dateOptionalTime 12. }, 13. price: { 14. type: long 15. }, 16. product_description: { 17. type: string 18. }, 19. product_name: { 20. type: string 21. }, 22. quan_available: { 23. type: long 24. } 25. } 26. } 27. } 28. } Now when I perform a search to match some date, I am not getting the expected result. Consider query like this { query: { filtered: { query: { query_string: { query: exp_date:[2013-03-1 TO 2013-03-5] } } } }, fields: [ price, quan_available, product_name, product_description, exp_date, man_date ], from: 0, size: 50, sort: { _score: { order: asc } }, explain: true } Gives me expected result, that is all the documents which matches this date range, but when I remove the field Name exp_date from the query string, I am getting no result at all. The following query results zero result. { query: { filtered: { query: { query_string: { query: [2013-03-1 TO 2013-03-5] } } } }, fields: [ price, quan_available, product_name, product_description, exp_date, man_date ], from: 0, size: 50, sort: { _score: { order: asc } }, explain: true } Can anybody suggest solution to this problem? What am doing wrong? Thansk in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5155a1ed-b2d2-4fa7-9f2e-c37dd3323d52%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Partial word match with singular and plurals: Elasticsearch
Any help? Why higher distance document scored higher? Is there any problem with stemmer or nGram settings? On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote: Hi Radu, Thank you so for the suggestions. I was knowing mul-field but was not knowing how helpful it can be but now I'm able play with the multi field feature. I tried following suggestion and created index and mapping accordingly. I tried querying for first 2. First one was simple and second one with slop. It is not returning correct slop(i,e, incremental distance). Please help/suggest query improvements. *Please see my settings below:* *For index: * curl -XPUT http://localhost:9200/my_improved_index; -d' { settings: { analysis: { filter: { trigrams_filter: { type: ngram, min_gram: 1, max_gram: 50 }, my_stemmer : { type : stemmer, name : minimal_english } }, analyzer: { trigrams: { type: custom, tokenizer: standard, filter: [ standard, lowercase, trigrams_filter ] }, my_stemmer_analyzer:{ type: custom, tokenizer: standard, filter: [ standard, lowercase, my_stemmer ] } } } } }' *For mappings:* curl -XPUT http://localhost:9200/my_improved_index/my_improved_index_type/_mapping; -d' { my_improved_index_type: { properties: { name: { type: multi_field, fields: { name_gram: { type: string, analyzer: trigrams }, untouched: { type: string, index: not_analyzed }, name_stemmer:{ type: string, analyzer: my_stemmer_analyzer } } } } } }' *Available documents:* 1. men’s shaver 2. men’s shavers 3. men’s foil shaver 4. men’s foils shaver 5. men’s foil shavers 6. men’s foils shavers 7.men's foil advanced shaver 8.norelco men's foil advanced shaver *Query:* curl -XPOST http://localhost:9200/my_improved_index/my_improved_index_type/_search; -d' { size: 30, query: { bool: { should: [ { match: { name.untouched: { query: men\s shaver, operator: and, type: phrase, boost: 10 } } }, { match_phrase: { name.name_stemmer: { query: men\s shaver, slop: 5 } } } ] } } }' *Returned result:* 1. men's shaver -- correct 2. men's shavers -- correct 3. men's foils shaver -- NOT correct 4. norelco men's foil advanced shaver -- NOT correct 5. men's foil advanced shaver -- NOT correct 6. men's foil shaver -- NOT correct. *Expected result:* 1. men's shaver -- exact phrase match 2. men's shavers -- ZERO word distance + 1 plural 3. men's foil shaver -- 1 word distance 4. men's foils shaver -- 1 word distance + 1 plural 5. men's foil advanced shaver -- 2 word distance 4. norelco men's foil advanced shaver -- 2 word distance Why higher distance document scored higher? Is there any problem with stemmer or nGram settings? On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote: Hi Kruti, The short answer is yes, it is possible. Here's one way to do it: Have the fields you search on as multi fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html, where you index them with various settings, like once not-analyzed for exact matches, once with ngrams to account for typoes and so on. You can query all those sub-fields, and use the multi-match query with best fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor the DisMax queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto wrap all those queries and take the best score (or the best score and a factor of the other scores by using the tie breaker). Now, for the specific requirements you have: 1. For exact matching, you can skip analysis altogether, and set
Re: Unable to create mapping and settings using Java API
Hmm, I'm able to create an index and its mappings/settings with a single JSON request to http://localhost:9200/indexName. What settings are you trying to set? Mike http://blog.mikemccandless.com On Thu, May 1, 2014 at 5:10 PM, Amit Soni amitson...@gmail.com wrote: hello everyone - I have settings and mapping defined in a single JSON document and I have been trying to find a way to create index using that JSON document. I tried different code snippets but have not found one which allows me to create settings as well as mapping using one JSON document. Any help on this will be great! -Amit. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2BEUsstyy7qdNq%2BRmHzA-Rp9mYNYnOoQ8HESiGAvXwXVg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2BEUsstyy7qdNq%2BRmHzA-Rp9mYNYnOoQ8HESiGAvXwXVg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRc-y_futLvWVuycgpxwSshJHawNWu8zrDkmZrfZ5sAnZw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: SearchParseExceptions in Marvel monitoring cluster
Thanks Boaz for your reply. Following is the output of curl SERVER:9200/_cat/shards/?v for both nodes of our marvel cluster: index shard prirep state docs storeip node .marvel-2014.05.01 0 p STARTED 70 865.4kb Server-ip-1 Marvel_1 .marvel-2014.05.01 0 r STARTED 70 865kbServer-ip-2 Marvel_2 Some more things to highlight, in the Marvel Dashboard - Cluster Overview page we get following errors : - Oops! FacetPhaseExecutionException[Facet [0]: (value) field [total.search.query_total] not found] --- in the Search Request Rate panel - Oops! FacetPhaseExecutionException[Facet [timestamp]: failed to find mapping for index.raw] --- in the Indices panel - Oops! FacetPhaseExecutionException[Facet [0]: (value) field [primaries.indexing.index_total] not found] --- in the Indexing Request Rate panel - Oops! FacetPhaseExecutionException[Facet [0]: (value) field [primaries.docs.count] not found] --- in the Document Count Panel All these apart from the SearchParseExceptions mentioned in earlier post. Also if Marvel is not storing the right data, how is it supposed to be handled? - Regards -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/SearchParseExceptions-in-Marvel-monitoring-cluster-tp4054926p4055150.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1398919064490-4055150.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: strange problem: my ES server almost lost all its data. (All shards failed)
Yes, you should use this option. Some FreeBSD kernels seem to have difficulties to run UDP multicast on IPv6 together with IPv4 properly, so I would like to suggest disabling IPv6 use on the JVM. Jörg On Fri, May 2, 2014 at 10:23 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Hi Jörg, Thank you for your reply. The service script includes an option that might deal with IPv6, but it's not active: # Force the JVM to use IPv4 stack # elasticshearch_props-Djava.net.preferIPv4Stack=true ( http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955 ) In past years, I used to disable IPv6 everywhere (kernel, ports compilation, etc.) but now I don't bother anymore. Do you mean I should use this option to force IPv4? Thanks, Patrick On 2 mai 2014, at 09:38, joergpra...@gmail.com wrote: On FreeBSD, do you have multicast on IPv6 enabled? You should disable IPv6 on the JVM. Seems you received a severe network error from the OS. Jörg On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, I'm running a small server with logstash, ES, Kibana. Tonight, I've restarted my ES process. Very bad idea: it restarted with lots of errors, and finally lost all its data. Basically, before restart, I've had: elasticsearch/nodes/0/indices/logstash-2014.* elasticsearch/nodes/0/_state/ after restart, I've had: elasticsearch/nodes/0/indices/logstash-2014.* elasticsearch/nodes/0/_state/ elasticsearch/nodes/1/indices/logstash-2014.05.01 elasticsearch/nodes/1/_state/ Then, Kibana was not able to find anything (dashboards lost, etc.). I've stopped Logstash, stopped Elasticsearch, waited a bit and checked everything is down, then restarted ES. It looked OK, then I've restarted Logstash, and I was able to access my dashboards again. I've just lost 15 minutes of data. Now I can see that elasticsearch/nodes/0 is the current working directory, and I can browse old data and current data. elasticsearch/nodes/1 is not used anymore. I'm running FreeBSD, and used the service command to restart ES. When attempting the second shutdown, the script wouldn't find the pid file, so I've had to kill the Java process. I don't understand what happened. But I don't feel comfortable putting ES in production. Full log for first and second restart here: http://patpro.net/elastic.log Any idea? Regards, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AA0BC7BA-8856-4A23-A172-0601BC0B4FEE%40patpro.net . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGN_zdUi9cCAoAgprBh-Lxtu_g1ejSQWT_nZU0fd_YRTA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Partial word match with singular and plurals: Elasticsearch
Hello, The exact match vs plural is probably because of the stemmer. As you have your fields and queries now, Elasticsearch has no way to boost individual exact word matches higher. To fix this, you can add another field where you just analyze the text using the standard analyzer (no stemming). Then add that to another query within your bool and exact word matches should be ranked higher. Though I would do a simple match for that (no phrase), to account for the case where one word is exact and one is plural - such a document should be ranked higher than if both are plurals. You'll get that with standard match because it looks for all terms, while match_phrase will try to match the phrase with the given slop and none of those two documents will get hit. I don't know why the higher distance document is scored higher in your case - the 6th result should have been higher. Can you try with an index of one shard and see if results are any different? Either way, you should get an explanation for each document's score by enabling Explain: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html Best regards, Radu -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, May 2, 2014 at 1:40 PM, Kruti Shukla krutibhat...@gmail.com wrote: Any help? Why higher distance document scored higher? Is there any problem with stemmer or nGram settings? On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote: Hi Radu, Thank you so for the suggestions. I was knowing mul-field but was not knowing how helpful it can be but now I'm able play with the multi field feature. I tried following suggestion and created index and mapping accordingly. I tried querying for first 2. First one was simple and second one with slop. It is not returning correct slop(i,e, incremental distance). Please help/suggest query improvements. *Please see my settings below:* *For index: * curl -XPUT http://localhost:9200/my_improved_index; -d' { settings: { analysis: { filter: { trigrams_filter: { type: ngram, min_gram: 1, max_gram: 50 }, my_stemmer : { type : stemmer, name : minimal_english } }, analyzer: { trigrams: { type: custom, tokenizer: standard, filter: [ standard, lowercase, trigrams_filter ] }, my_stemmer_analyzer:{ type: custom, tokenizer: standard, filter: [ standard, lowercase, my_stemmer ] } } } } }' *For mappings:* curl -XPUT http://localhost:9200/my_improved_index/my_improved_ index_type/_mapping -d' { my_improved_index_type: { properties: { name: { type: multi_field, fields: { name_gram: { type: string, analyzer: trigrams }, untouched: { type: string, index: not_analyzed }, name_stemmer:{ type: string, analyzer: my_stemmer_analyzer } } } } } }' *Available documents:* 1. men’s shaver 2. men’s shavers 3. men’s foil shaver 4. men’s foils shaver 5. men’s foil shavers 6. men’s foils shavers 7.men's foil advanced shaver 8.norelco men's foil advanced shaver *Query:* curl -XPOST http://localhost:9200/my_improved_index/my_improved_ index_type/_search -d' { size: 30, query: { bool: { should: [ { match: { name.untouched: { query: men\s shaver, operator: and, type: phrase, boost: 10 } } }, { match_phrase: { name.name_stemmer: { query: men\s shaver, slop: 5 } } } ] } } }' *Returned result:* 1. men's shaver -- correct 2. men's shavers -- correct 3. men's foils shaver -- NOT correct 4. norelco men's foil advanced shaver -- NOT correct 5. men's foil advanced shaver -- NOT correct 6. men's foil shaver -- NOT correct. *Expected result:* 1. men's shaver -- exact phrase match 2.
Re: strange problem: my ES server almost lost all its data. (All shards failed)
Thank you for the tip, Jörg. I've activated this option and carefully restarted. I've re-read yesterday's log file, and now I think may be the new ES instance started before the former one was completely terminated. This too can cause some network/socket trouble. I might try and add a short sleep into the restart command. On 2 mai 2014, at 14:07, joergpra...@gmail.com wrote: Yes, you should use this option. Some FreeBSD kernels seem to have difficulties to run UDP multicast on IPv6 together with IPv4 properly, so I would like to suggest disabling IPv6 use on the JVM. Jörg On Fri, May 2, 2014 at 10:23 AM, Patrick Proniewski elasticsea...@patpro.net wrote: Hi Jörg, Thank you for your reply. The service script includes an option that might deal with IPv6, but it's not active: # Force the JVM to use IPv4 stack # elasticshearch_props-Djava.net.preferIPv4Stack=true ( http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955 ) In past years, I used to disable IPv6 everywhere (kernel, ports compilation, etc.) but now I don't bother anymore. Do you mean I should use this option to force IPv4? Thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CE3C61C8-3EC1-49A0-A6DC-F38432CF123C%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: Need help on similarity ranking approach
Thanks Binh Ly and Ivan Brusic for your replies. I need to find the similarity in percentage of a document against other documents and this will be considered for grouping the documents. is it possible to get the similarity percentage using more like this query? or is any other way to calculate the percentage of similarity from the query result? Eg: document1 is 90% similar to document2. document1 is 45% similar to document3 etc.. Thanks -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4055227.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1399036954584-4055227.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation bug? Or user error?
What version of Elasticsearch are you using? If it is small enough, I would also be interested if you could share your index so that I can try to reproduce the issue locally. On Fri, May 2, 2014 at 12:07 PM, mooky nick.minute...@gmail.com wrote: I havent been able to figure out what is required to recreate it. I am doing a number of identical aggregations (just different values intentMarketCode and intentDate Three aggregations give correct numbers - one doesnt I havent figured why On Wednesday, 30 April 2014 14:13:00 UTC+1, Adrien Grand wrote: This looks wrong indeed. By any chance, would you have a curl recreation of this issue? On Tue, Apr 29, 2014 at 7:35 PM, mooky nick.mi...@gmail.com wrote: It looks like a bug to me - but if its user error, then obviously I can fix it a lot quicker :) On Tuesday, 29 April 2014 13:04:53 UTC+1, mooky wrote: I am seeing some very odd aggregation results - where the sum of the sub-aggregations is more than the parent bucket. Results: CSSX : { doc_count : *24*, intentDate : { buckets : [ { key : Overdue, to : 1.3981248E12, to_as_string : 2014-04-22, doc_count : *1*, ME : { doc_count : *0* }, NOT_ME : { doc_count : *24* } }, { key : May, from : 1.3981248E12, from_as_string : 2014-04-22, to : 1.4006304E12, to_as_string : 2014-05-21, doc_count : *23*, ME : { doc_count : 0 }, NOT_ME : { doc_count : *24* } }, { key : June, from : 1.4006304E12, from_as_string : 2014-05-21, to : 1.4033088E12, to_as_string : 2014-06-21, doc_count : *0*, ME : { doc_count : *0* }, NOT_ME : { doc_count : *24* } } ] } }, I wouldn't have thought that to be possible at all. Here is the request that generated the dodgy results. CSSX : { filter : { and : { filters : [ { type : { value : inventory } }, { term : { isAllocated : false } }, { term : { intentMarketCode : CSSX } }, { terms : { groupCompanyId : [ 0D13EF2D0E114D43BFE362F5024D8873, 0D593DE0CFBE49BEA3BF5AD7CD965782, 1E9C36CC45C64FCAACDEE0AF4FB91FBA, 33A946DC2B0E494EB371993D345F52E4, 6471AA50DFCF4192B8DD1C2E72A032C7, 9FB2FFDC0FF0797FE04014AC6F0616B6, 9FB2FFDC0FF1797FE04014AC6F0616B6, 9FB2FFDC0FF2797FE04014AC6F0616B6, 9FB2FFDC0FF3797FE04014AC6F0616B6, 9FB2FFDC0FF5797FE04014AC6F0616B6, 9FB2FFDC0FF6797FE04014AC6F0616B6, AFE0FED33F06AFB6E04015AC5E060AA3 ] } }, { not : { filter : { terms : { status : [ Cancelled, Completed ] } } } } ] } }, aggregations : { intentDate : { date_range : { field : intentDate, ranges : [ { key : Overdue, to : 2014-04-22 }, { key : May, from : 2014-04-22, to : 2014-05-21 }, { key : June, from : 2014-05-21, to : 2014-06-21 } ] }, aggregations : { ME : { filter : { term : { trafficOperatorSid : S-1-5-21-20xxspan style=color: #000; class=styled-by ... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d% 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e7d8928-f76b-4358-97b9-3189e037006c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/3e7d8928-f76b-4358-97b9-3189e037006c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options,
Re: Significant Term aggregation
your second concern that the query criteria is not identifying a result set with any sense of cohesion might be true. Basically the search I am executing is a filter. Either the document metadata either has the value or not. Hence the result set may not be cohesive. The reason for me to use the Significant terms is so that the query can be enhanced to provide a more cohesive set of documents. We can probably debug that from the results of the agg. For each significant term you should get a score and all the ingredients that went into it are also available: 1) The number of docs in the result set with the given term 2) The size of your result set 3) The number of docs in the index with the given term (see the bg_count value) 4) The size of the index In a cohesive set you should see a reasonable difference in the term probabilities e.g. the numbers 1/2 vs 3/4 If all you've selected in your query is effectively random docs with no common theme then the use of words in background and foreground barely differ and 1/2 vs 3/4 are practically the same giving a poor-scoring set of results. On Thursday, 1 May 2014 10:04:15 UTC-5, Mark Harwood wrote: Thanks for the feedback, Ramdev. What I noticed in my aggregation results is a lot of Stopwords (a, an, the, at, and, etc.) being included as significant terms. These sorts of terms shouldn't really need any sort of special treatment. If they are appearing as suggestions then I expect one of the following statements to be true: 1) You have a very small number of docs in the result set representing the foreground sample. Significant terms needs a reasonable number of docs in a sample to draw any real conclusions 2) You have query criteria that is not identifying a result set with any sense of cohesion e.g. a query for random docs 3) You have changed the set of stopwords in use in your index - what previously never used to appear at all is now suddenly common or vice-versa. 4) You are querying across mixed indices or doc-types (one with stop-words, one without) and we fail to tune-out the stopwords as part of the results merging process because one small index reports them back as commonplace while another large index has them as missing or rare. In the merged stats they therefore appear to be highly correlated with your query request. Please let me know if none of these scenarios explain your results. Another possible enhancement would be get a phrase significance (instead of a single term, doing a multi term significance) would be nice. I outline some of the possibilities in creating phrases from significant terms, starting 51 mins into this recent video: https://skillsmatter.com/skillscasts/5175-revealing-the-uncommonly-common-with-elasticsearch Cheers and Thanks for all the fish You're welcome and thanks again for the feedback Mark -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25602f15-42ab-4857-9880-509d66a1a818%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Terms Aggregations
Hi Adrien, Thanks for your answer, but I have a question. Wouldn't that give me the different sums of the values of those fields? What I need is, using the example from before: Doc1 : { field1 : A, field2: B, field3: C, size: 1, } Doc2: { field1 : A, field2: B2, field3: C2, size: 2} Doc3: { field1 : Z, field2: B3, field3: C3, size: 99 } If I search in my index and those three documents match my query I want a list of the possible values that 'field1' can take and the sum of the 'size' fields for all documents with each value in my result set. So in this case I would expect: field1: { {value: 'A', sum_of_sizes: 3} {value: 'Z', sum_of_sizes: 99} } Thanks, Jose. On Friday, 2 May 2014 14:51:36 UTC+1, Adrien Grand wrote: Hi Jose, There are two ways to do so: either with a script (slow because term ordinals can't be used): terms : { script: doc['A'].values + doc['B'].values + doc['C'].values } Or by having all values in a single field at indexing time (potentially using `copy_to`[1]). [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to On Fri, May 2, 2014 at 11:44 AM, Jose A. Garcia argan...@gmail.comjavascript: wrote: Hi, I have a question about Aggregations. I have documents with several fields: { field1 : A, field2: B, field3: C, size: 1, } { field1 : A, field2: B2, field3: C2, size: 2, } { field1 : Z, field2: B3, field3: C3, size: 99, } And I need to be able to calculate aggregations for each one of those fields, and get the sum of the sizes for each field, so for example, aggregating by field1 should get me { A, size = 3 }, {Z, size = 99}. Looking at the documentation for aggregations I can see how to get the sum for a field and how to get the terms and their counts, but I need a combination of both, what is the best way to do this? Thanks in advance, Jose. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1deb1eb1-a5dc-40cb-8689-c5518869f40a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1deb1eb1-a5dc-40cb-8689-c5518869f40a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec381c10-45cd-4406-8aac-2e542097cf49%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to write a custom river
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [river] at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule. java:87) at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java :58) at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder .java:44) at org.elasticsearch.river.RiversService.createRiver(RiversService. java:137) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse( RiversService.java:275) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse( RiversService.java:269) at org.elasticsearch.action.support. TransportAction$ThreadedActionListener$1.run(TransportAction.java:93) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source ) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: river at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule. java:73) ... 9 more On Friday, May 2, 2014 8:51:42 AM UTC-5, Joshua Chan wrote: I'm not sure I follow. In my Plugin.onModule I have public void onModule(RiversModule module) { module.registerRiver(RiverImpl.TYPE, ModuleImpl.class); //client.admin().indices().prepareDeleteMapping(_river).setType(riverName.name()).execute(); } And on my Module I have protected void configure() { bind(River.class).to(RiverImpl.class).asEagerSingleton(); } On Thursday, May 1, 2014 11:08:28 PM UTC-5, Rob Ottaway wrote: I should have sent you the following earlier rather than a non-river plugin: the plugin: https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/plugin/river/rabbitmq/RabbitmqRiverPlugin.java The river implementation: https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiver.java The module: https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiverModule.java Looks like you are registering the river implementation rather than the river module hence the not working. Had to look at an example I know works to figure it out. -Rob On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.be...@gmail.comwrote: So, that's what I did, but no love... I checked in the latest. -Josh On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote: Look at this plugin for help: https://github.com/elasticsearch/elasticsearch- cloud-aws/blob/master/src/main/resources/es-plugin.properties Yes it needs to be the FQN. On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote: Thanks Rob. Someone else also told me the plugin property should be the fully qualified name. I didn't declare a package, so I guess I'm using the default package, and I thought I had the namespacing right since IntelliJ corrected the class name when I wrote it. Thoughts? -Josh On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote: Look at this file in your BB repo: https://bitbucket.org/futurechan/example-river/src/ fd23648c3e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/ es-plugin.properties?at=master cheers On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote: This strikes me as odd: java.lang.ClassNotFoundException: example_river Assume you didn't map the string example_river to the actual class name properly? -Rob On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote: I'm making my first go at writing a river. (Here's the source code: https://bitbucket.org/futurechan/example-river/src) I followed this tutorial http://blog.trifork.com/2013/01/10/how-to-write-an- elasticsearch-river-plugin/ and compared it to this existing river https://github.com/jprante/elasticsearch-river-jdbc but I haven't had much luck. To deploy the river, I created a folder called example-river under plugins, dropped my jar in that folder, and restarted the node. Everything starts up fine. I have also tried bin/plugin --url file:///path/to/plugin --install example-river, which seems to work, but it unpacks my jar. So, I tried zipping it first and then installing, which works and does not unpack my jar, but it didn't help. When I issue this PUT request: http://localhost:9200/_river/example_river/_meta {
Re: How to write a custom river
I think he means in your Guice module. You are registering the WRONG thing ;) On Fri, May 2, 2014 at 6:49 AM, Joshua Chan joshua.bennett.c...@gmail.comwrote: I've tried this too with no luck http://localhost:9200/_river/example_river/_meta { type: river, river:{ blah:blah } } On Friday, May 2, 2014 3:31:23 AM UTC-5, Luca Cavanna wrote: Hi Joshua, the package is not an issue if you are using the default one for your classes. Looking deeper, the type of the river that you try to register with your rest call doesn't match the type of the river you registered in the plugin when you did module.registerRiver(type, riverclass). Cheers Luca On Friday, May 2, 2014 6:08:28 AM UTC+2, Rob Ottaway wrote: I should have sent you the following earlier rather than a non-river plugin: the plugin: https://github.com/elasticsearch/elasticsearch- river-rabbitmq/blob/master/src/main/java/org/elasticsearch/plugin/river/ rabbitmq/RabbitmqRiverPlugin.java The river implementation: https://github.com/elasticsearch/elasticsearch- river-rabbitmq/blob/master/src/main/java/org/ elasticsearch/river/rabbitmq/RabbitmqRiver.java The module: https://github.com/elasticsearch/elasticsearch- river-rabbitmq/blob/master/src/main/java/org/ elasticsearch/river/rabbitmq/RabbitmqRiverModule.java Looks like you are registering the river implementation rather than the river module hence the not working. Had to look at an example I know works to figure it out. -Rob On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.be...@gmail.comwrote: So, that's what I did, but no love... I checked in the latest. -Josh On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote: Look at this plugin for help: https://github.com/elasticsearch/elasticsearch-cloud-aws/ blob/master/src/main/resources/es-plugin.properties Yes it needs to be the FQN. On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote: Thanks Rob. Someone else also told me the plugin property should be the fully qualified name. I didn't declare a package, so I guess I'm using the default package, and I thought I had the namespacing right since IntelliJ corrected the class name when I wrote it. Thoughts? -Josh On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote: Look at this file in your BB repo: https://bitbucket.org/futurechan/example-river/src/fd23648c3 e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/es- plugin.properties?at=master cheers On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote: This strikes me as odd: java.lang.ClassNotFoundException: example_river Assume you didn't map the string example_river to the actual class name properly? -Rob On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote: I'm making my first go at writing a river. (Here's the source code: https://bitbucket.org/futurechan/example-river/src) I followed this tutorial http://blog.trifork.com/2013/01/10/how-to-write-an-elasticse arch-river-plugin/ and compared it to this existing river https://github.com/jprante/elasticsearch-river-jdbc but I haven't had much luck. To deploy the river, I created a folder called example-river under plugins, dropped my jar in that folder, and restarted the node. Everything starts up fine. I have also tried bin/plugin --url file:///path/to/plugin --install example-river, which seems to work, but it unpacks my jar. So, I tried zipping it first and then installing, which works and does not unpack my jar, but it didn't help. When I issue this PUT request: http://localhost:9200/_river/example_river/_meta { type: example_river, example_river:{ blah:blah } } I get this exception: [2014-04-20 22:28:46,538][DEBUG][river ] [Gloom] creating river [example_river][example_river] [2014-04-20 22:28:46,543][WARN ][river ] [Gloom] failed to create river [example_river][example_river] org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [example_river] at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87) at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58) at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44) at org.elasticsearch.river.RiversService.createRiver(RiversService.java:137) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269) at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.jav a:93) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: example_river at java.net.URLClassLoader$1.run(Unknown Source) at
Re: How to write a custom river
Oh sorry module is fine, it's the call to module.registerRiver which is being passed the River itself and not the River Guice module. Try that change. On Fri, May 2, 2014 at 10:33 AM, Rob Ottaway robotta...@gmail.com wrote: I think he means in your Guice module. You are registering the WRONG thing ;) On Fri, May 2, 2014 at 6:49 AM, Joshua Chan joshua.bennett.c...@gmail.com wrote: I've tried this too with no luck http://localhost:9200/_river/example_river/_meta { type: river, river:{ blah:blah } } On Friday, May 2, 2014 3:31:23 AM UTC-5, Luca Cavanna wrote: Hi Joshua, the package is not an issue if you are using the default one for your classes. Looking deeper, the type of the river that you try to register with your rest call doesn't match the type of the river you registered in the plugin when you did module.registerRiver(type, riverclass). Cheers Luca On Friday, May 2, 2014 6:08:28 AM UTC+2, Rob Ottaway wrote: I should have sent you the following earlier rather than a non-river plugin: the plugin: https://github.com/elasticsearch/elasticsearch- river-rabbitmq/blob/master/src/main/java/org/ elasticsearch/plugin/river/rabbitmq/RabbitmqRiverPlugin.java The river implementation: https://github.com/elasticsearch/elasticsearch- river-rabbitmq/blob/master/src/main/java/org/ elasticsearch/river/rabbitmq/RabbitmqRiver.java The module: https://github.com/elasticsearch/elasticsearch- river-rabbitmq/blob/master/src/main/java/org/ elasticsearch/river/rabbitmq/RabbitmqRiverModule.java Looks like you are registering the river implementation rather than the river module hence the not working. Had to look at an example I know works to figure it out. -Rob On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.be...@gmail.comwrote: So, that's what I did, but no love... I checked in the latest. -Josh On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote: Look at this plugin for help: https://github.com/elasticsearch/elasticsearch-cloud-aws/ blob/master/src/main/resources/es-plugin.properties Yes it needs to be the FQN. On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote: Thanks Rob. Someone else also told me the plugin property should be the fully qualified name. I didn't declare a package, so I guess I'm using the default package, and I thought I had the namespacing right since IntelliJ corrected the class name when I wrote it. Thoughts? -Josh On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote: Look at this file in your BB repo: https://bitbucket.org/futurechan/example-river/src/fd23648c3 e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/es- plugin.properties?at=master cheers On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote: This strikes me as odd: java.lang.ClassNotFoundException: example_river Assume you didn't map the string example_river to the actual class name properly? -Rob On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote: I'm making my first go at writing a river. (Here's the source code: https://bitbucket.org/futurechan/example-river/src) I followed this tutorial http://blog.trifork.com/2013/01/10/how-to-write-an-elasticse arch-river-plugin/ and compared it to this existing river https://github.com/jprante/elasticsearch-river-jdbc but I haven't had much luck. To deploy the river, I created a folder called example-river under plugins, dropped my jar in that folder, and restarted the node. Everything starts up fine. I have also tried bin/plugin --url file:///path/to/plugin --install example-river, which seems to work, but it unpacks my jar. So, I tried zipping it first and then installing, which works and does not unpack my jar, but it didn't help. When I issue this PUT request: http://localhost:9200/_river/example_river/_meta { type: example_river, example_river:{ blah:blah } } I get this exception: [2014-04-20 22:28:46,538][DEBUG][river ] [Gloom] creating river [example_river][example_river] [2014-04-20 22:28:46,543][WARN ][river ] [Gloom] failed to create river [example_river][example_river] org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [example_river] at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87) at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58) at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44) at org.elasticsearch.river.RiversService.createRiver(RiversService.java:137) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275) at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269) at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.jav a:93) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at
Re: Significant Term aggregation
I think I should clarify something. Even though my query is essentially a filter, the significant terms aggregation is run against the body of the documents (which is typical prose in a news document). here is an example : Query : Query index to find docs with a Specific String in field Class_Text with aggregation (Significant Terms) on the Body of the document: POST _search { size : 0, query : { nested : { query : { match : { Class_Text : { query : Fuel Cell Battery, type : boolean } } }, path : SMART_TERM } }, aggregations : { sigTerms : { significant_terms : { field : BODY.v, size : 1000 } } } } .. { key: resistance, doc_count: 68795, score: 53.42999474620047, bg_count: 129149 }, { key: patented, doc_count: 42848, score: 50.98806065128648, bg_count: 52548 }, { key: marketintelligencecenter.com's, doc_count: 33701, score: 48.58994469232905, bg_count: 34122 }, { key: for, doc_count: 427040, score: 47.73227955829178, bg_count: 5483708 }, { key: html, doc_count: 91658, score: 46.79933234224686, bg_count: 261374 }, { key: an, doc_count: 348706, score: 43.20270422802958, bg_count: 4046974 }, { key: protection, doc_count: 80987, score: 43.187880126230326, bg_count: 221159 }, { key: of, doc_count: 430217, score: 42.90990816758588, bg_count: 6177535 }, { key: by, doc_count: 364873, score: 42.68719313911975, bg_count: 4480098 }, ... as you can see words like for an of by are showing up in the aggregations list with pretty decent scores to put them in the top 50 significant terms. The documents get tagged with Class_Text after being classified and that value is being queried in the query. In my case it would be more helpful if I am able to get Phrases rather than terms. (I am yet to finish watching your presentation). let me know if you have any insight . Thanks much Ramdev On Fri, May 2, 2014 at 9:07 AM, Mark Harwood mark.harw...@elasticsearch.com wrote: your second concern that the query criteria is not identifying a result set with any sense of cohesion might be true. Basically the search I am executing is a filter. Either the document metadata either has the value or not. Hence the result set may not be cohesive. The reason for me to use the Significant terms is so that the query can be enhanced to provide a more cohesive set of documents. We can probably debug that from the results of the agg. For each significant term you should get a score and all the ingredients that went into it are also available: 1) The number of docs in the result set with the given term 2) The size of your result set 3) The number of docs in the index with the given term (see the bg_count value) 4) The size of the index In a cohesive set you should see a reasonable difference in the term probabilities e.g. the numbers 1/2 vs 3/4 If all you've selected in your query is effectively random docs with no common theme then the use of words in background and foreground barely differ and 1/2 vs 3/4 are practically the same giving a poor-scoring set of results. On Thursday, 1 May 2014 10:04:15 UTC-5, Mark Harwood wrote: Thanks for the feedback, Ramdev. What I noticed in my aggregation results is a lot of Stopwords (a, an, the, at, and, etc.) being included as significant terms. These sorts of terms shouldn't really need any sort of special treatment. If they are appearing as suggestions then I expect one of the following statements to be true: 1) You have a very small number of docs in the result set representing the foreground sample. Significant terms needs a reasonable number of docs in a sample to draw any real conclusions 2) You have query criteria that is not identifying a result set with any sense of cohesion e.g. a query for random docs 3) You have changed the set of stopwords in use in your index - what previously never used to appear at all is now suddenly common or vice-versa. 4) You are querying across mixed indices or doc-types (one with stop-words, one without) and we fail to tune-out the stopwords as part of the results merging process because one small index reports
Re: Handling updates from multiple sources
My system is changing rapidly. The final result is to have all the data inside the ES index. The way I have it set up currently I have 2 different systems that write to the ES index: 1) Bulk job. Run through all the dbs, fetch things in batch updates of 5k and send it to ES. 2) Live updating job. Pickup the newest changes and send them to ES. Either updates or inserts. Note: the updates don't contain full documents After this step (1) and (2) I would like to have (almost) 100% guarantee that the index is full and up to date. I think that this is quite common use case if you want to have an index with live data, not stale as of the time of the beginning of the bulk job. W dniu czwartek, 1 maja 2014 19:45:53 UTC-7 użytkownik Rob Ottaway napisał: I missed that the later doc would only be partial. What is the reason to use the partial doc? That really complicates things. Filling in missing fields is going to be a very large headache. You'll probably kill performance trying to do it too. Likely it'll be so complex it will present a lot more trouble. I think if you can better present the overall use cases you will get better insight into how to work this out. On Thursday, May 1, 2014 4:51:03 PM UTC-7, Michał Zgliczyński wrote: Hi, Thank you for your response. I have looked through this blog post: http://www.elasticsearch.org/blog/elasticsearch-versioning-support/ It looks as if external versioning would be the way to go. Have the timestamps act as version numbers and let ES only pick the document with the newest version as the correct document. However, with the situation I have presented above, ES will fail. A quote from the post: With version_type set to external, Elasticsearch will store the version number as given and will not increment it. Also, instead of checking for an exact match, Elasticsearch will only return a version collision error if the version currently stored is greater or equal to the one in the indexing command. This effectively means “only store this information if no one else has supplied the same or a more recent version in the meantime”. Concretely, the above request will succeed if the stored version number is smaller than 526. 526 and above will cause the request to fail. In my example, we would have that situation. A partial doc with a larger version number(later timestamp) is already stored in ES and we get the complete document with a smaller timestamp. In this situation we would like to merge these 2 documents in a way that, we have all of the fields from the partial doc and the other fields(not currently specified in the ES document) to be filled from the complete document. Thanks! Michal Zgliczynski W dniu czwartek, 1 maja 2014 14:58:31 UTC-7 użytkownik Rob Ottaway napisał: Have you looked at using versioning? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning cheers, Rob On Thursday, May 1, 2014 2:47:39 PM UTC-7, Michał Zgliczyński wrote: Hi, I am building a system in which I will have two sources of updates: 1) Bulk updating from the source of truth(db) - Always inserting documents(complete docs) 2) Live updates - Adding insert and update (complete and incomplete docs) Also, lets assume that each insert/update has a timestamp, which we belive in (not ES timestamp). The idea is to have a complete, up to date index once the bulk updating finishes. To achieve this I need to guarantee that I will have the correct data. This would work mostly well, if everything we would do upserts and the inserts/updates coming into ES have a strictly increasing timestamp. But one could imagine that this is a possibly problematic situation, when: 1) We are performing bulk indexing, a) we read an object from the db b) process it c) send it to ES. 2) We have an update on the same object, after step (a) and before if makes to ES in the bulk updating - phase(c). That is, ES gets an update with new data and only after that we get the insert with the entire document from the source of truth with older data. Hence, in ES we have a document with a newer timestamp, than the newly added one phase(c). My theoretical solution: For each operation, have the timestamp for that change (timestamp from the system that made the change, not from Elastic Search). Lets say that all of the operations that we will perform are upserts. Then once we get an insert or an update (lets call it doc), we have to perform the following script (pseudo mvel) inside ES. { if (doc.timestamp ctx.source.timestamp) { // doc is newer than what was in ES upsert(doc); // update the index with all of the info from the new doc } else { // there is already a document in ES with a newer timestamp, note, this may be an incomplete document (an update) __fill the missing fields in the document in ES with values from doc__ }