Function score query does NOT replace boost field well - how to access child's fields?!
Hello Boost field is deprecated since ES 1.0.0 so I've decided to change it to the function score query as suggested in the docs. So I tried to change this: https://gist.github.com/zwrss/aeaf2828f6dd35a1e888#file-boost-field Into this: https://gist.github.com/zwrss/aeaf2828f6dd35a1e888#file-function-score The problem is that i can't use child's fields inside the top_children query. How should I handle this? Thanks in advance Paweł Młynarczyk -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d68bab8-30cf-4382-bf78-36753b9f3ca6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: filtered query vs query performance
I've tried to query the children first and then do some ID filtering on parents but the scoring is then screwed and the performance is even worse (as expected). I still have not found any satisfying solution to this matter. W dniu czwartek, 3 kwietnia 2014 14:44:37 UTC+2 użytkownik Paweł Młynarczyk napisał: > > Hello > > I have a parent-child index in my app (mapping gist: > https://gist.github.com/zwrss/9953291#file-mapping). I have about 250.000 > parents and 7.000.000 childrens. > I was trying to speed up my top_children query (gist: > https://gist.github.com/zwrss/9953291#file-parent-query) a bit, so I > tried adding some filters to ensure that the query have to score only about > 300 parents (gist: > https://gist.github.com/zwrss/9953291#file-filtered-parent-query). > The effect was surprising - the original query was executing in ~150 ms > and the filtered one in ~270 ms. > I thought that adding some filters to ease the scoring process would help. > I thought that te top_children is the thing here so I decided to do some > testing and tried to query only Children and add some filters to narrow the > results. > The original query I tried (gist: > https://gist.github.com/zwrss/9953291#file-child-query) executed in ~90 > ms and the filtered one (gist: > https://gist.github.com/zwrss/9953291#file-filtered-child-query) in ~120 > ms. > > Is that the correct behaviour? Am I missing something? > > best regards > Paweł Młynarczyk > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b117c316-8ab2-4618-b65c-32c77b42ecf1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Sub-aggregations not working as expected
Hello Chris ElasticSearch does not recognize your 'recipe' list as a list of logically connected fields. So when your top level aggregation returns i.e. rock, your sub-aggregation does not count stats based on the rock material, but based on all the materials that are included in the documents that also include rock. You could try to index those materials as child documents for the main index and then just do aggs over the child type. You may also want to read this: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ Best regards Paweł Młynarczyk W dniu poniedziałek, 7 kwietnia 2014 02:16:38 UTC+2 użytkownik chris Hahn napisał: > > Playing with elasticsearch for a project. > This is sample data, every document will have a list of ingredients, some > of the ingredients may be used in different documents and in different > quantities. I would like an aggregation that lists all ingredients for a > search, and the stats of each ingredient. > > I'm looking to this group for suggestions, can I structure my query > different to get the results I would like (or should I structure my > document different?). > Basically I have two constraints: I don't know what the ingredients are > when the query is written. I would like to list all ingredients, and the > average amounts of each. > > Sample data: >{ >"ingredients": [ > { > "name": "Rock", > "quantity": 6, > "unit": "lb" > }, > { > "name": "Dirt", > "quantity": 6, > "unit": "lb" > }, > { > "name": "Mortar", > "quantity": 3, > "unit": "lb" > } >] > } > > This query looks like it works, but doesn't. I'm quite confused as to > where these numbers are coming from . > Query: > POST /concrete/recipe/_search > { > "query" : {"match_all" : {}}, > "aggs" : { > "ingredients" : { > "terms" : { > "field" : "ingredients.name" > }, > "aggs" : { > "pounds" : { "stats" : { "field" : "ingredients.quantity" > } } > } > } > } > } > > Results: > { >"took": 4, >"timed_out": false, >"_shards": { > "total": 1, > "successful": 1, > "failed": 0 >}, >"hits": { > "total": 2, > "max_score": 1, > "hits": [ > { > "_index": "concrete", > "_type": "recipe", > "_id": "1", > "_score": 1, > "_source": { >"ingredients": [ > { > "name": "Rock", > "quantity": 6, > "unit": "lb" > }, > { > "name": "Dirt", > "quantity": 6, > "unit": "lb" > }, > { > "name": "Mortar", > "quantity": 3, > "unit": "lb" > } >] > } > }, > { > "_index": "concrete", > "_type": "recipe", > "_id": "2", > "_score": 1, > "_source": { >"ingredients": [ > { > "name": "Rock", > "quantity": 8, > "unit": "lb" > }, > { > "name": "Quartz", > "quantity": 0.5, > "unit": "l
Re: filtered query vs query performance
Update: Filtered Aliases allows me to speed up the query (about 50%). The thing is I can't use aliases since the filter I am applying is not static. Why does filtered alias perform so much better then filtered query? W dniu czwartek, 3 kwietnia 2014 14:44:37 UTC+2 użytkownik Paweł Młynarczyk napisał: > > Hello > > I have a parent-child index in my app (mapping gist: > https://gist.github.com/zwrss/9953291#file-mapping). I have about 250.000 > parents and 7.000.000 childrens. > I was trying to speed up my top_children query (gist: > https://gist.github.com/zwrss/9953291#file-parent-query) a bit, so I > tried adding some filters to ensure that the query have to score only about > 300 parents (gist: > https://gist.github.com/zwrss/9953291#file-filtered-parent-query). > The effect was surprising - the original query was executing in ~150 ms > and the filtered one in ~270 ms. > I thought that adding some filters to ease the scoring process would help. > I thought that te top_children is the thing here so I decided to do some > testing and tried to query only Children and add some filters to narrow the > results. > The original query I tried (gist: > https://gist.github.com/zwrss/9953291#file-child-query) executed in ~90 > ms and the filtered one (gist: > https://gist.github.com/zwrss/9953291#file-filtered-child-query) in ~120 > ms. > > Is that the correct behaviour? Am I missing something? > > best regards > Paweł Młynarczyk > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab9ae1dc-2d34-43d4-bd08-3919acd3abc8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
filtered query vs query performance
Hello I have a parent-child index in my app (mapping gist: https://gist.github.com/zwrss/9953291#file-mapping). I have about 250.000 parents and 7.000.000 childrens. I was trying to speed up my top_children query (gist: https://gist.github.com/zwrss/9953291#file-parent-query) a bit, so I tried adding some filters to ensure that the query have to score only about 300 parents (gist: https://gist.github.com/zwrss/9953291#file-filtered-parent-query). The effect was surprising - the original query was executing in ~150 ms and the filtered one in ~270 ms. I thought that adding some filters to ease the scoring process would help. I thought that te top_children is the thing here so I decided to do some testing and tried to query only Children and add some filters to narrow the results. The original query I tried (gist: https://gist.github.com/zwrss/9953291#file-child-query) executed in ~90 ms and the filtered one (gist: https://gist.github.com/zwrss/9953291#file-filtered-child-query) in ~120 ms. Is that the correct behaviour? Am I missing something? best regards Paweł Młynarczyk -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e3cdf588-a0f2-4911-9143-59dd926469c3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Thread Pools and Queues
Thank you for your answer. I did some tests and it appears that the lowest queue size between the data nodes is the effective one (as far as I am concerned the search requests are spread across all nodes with data) and queue sizes in non data nodes are ignored (I guess that it is not ignored but redirecting search requests to data nodes omits storing them in queue). So the given solution is not working. Or am I missing something? Paweł Młynarczyk W dniu poniedziałek, 24 marca 2014 15:17:22 UTC+1 użytkownik Jörg Prante napisał: > > You could set up a special node for "low priority" search, with "slim" > thread pool settings, and forward client calls to it respectively. > > Jörg > > > On Mon, Mar 24, 2014 at 3:08 PM, Paweł Młynarczyk > > > wrote: > >> There are many users working with the application. Some of them are >> online users, that should have their requests served in real time and some >> of them are workers that are preparing reports and their requests can wait. >> At the moment, all the requests arriving to elasticsearch are queued at the >> same queue and that sometimes results in online users not getting their >> request served as soon as possible or even getting rejected when the queue >> is full of workers' requests. That could be solved by adding separate >> Thread Pool and Queue for 'low priority search'. Same would apply for other >> operations (get, bulk etc). >> >> Are there any plans or any chance for this kind of feature being >> implemented in Elasticsearch? >> >> -- > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3bdf14af-b2c0-4153-9f49-c7f5ad737e11%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Thread Pools and Queues
There are many users working with the application. Some of them are online users, that should have their requests served in real time and some of them are workers that are preparing reports and their requests can wait. At the moment, all the requests arriving to elasticsearch are queued at the same queue and that sometimes results in online users not getting their request served as soon as possible or even getting rejected when the queue is full of workers' requests. That could be solved by adding separate Thread Pool and Queue for 'low priority search'. Same would apply for other operations (get, bulk etc). Are there any plans or any chance for this kind of feature being implemented in Elasticsearch? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14bb9a8c-5d74-4a0e-a155-1178ce2be25b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Mapper Attachments Plugin and SMILE
Hello I've got a problem working with Mapper Attachments Plugin. This code val json = XContentFactory.jsonBuilder().startObject() .field("file").startObject() .field("content").value(Base64.encodeBytes(bytes)) .endObject() .endObject() client.prepareIndex(indexName, typeName) .setSource(json).execute().actionGet() works perfectly well, but when I change jsonBuilder to smileBuilder, ES throws org.elasticsearch.index.mapper.MapperParsingException: failed to parse at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:540) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462) at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371) at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:215) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Current token (VALUE_STRING) not VALUE_EMBEDDED_OBJECT, can not access as binary at [Source: [B@3cb66faa; line: -1, column: 98325] at org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1524) at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:557) at org.elasticsearch.common.jackson.dataformat.smile.SmileParser.getBinaryValue(SmileParser.java:1212) at org.elasticsearch.common.jackson.core.JsonParser.getBinaryValue(JsonParser.java:1131) at org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue(JsonXContentParser.java:183) at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:321) at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517) at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515) ... 8 more and when I try not to encode the file val json = XContentFactory.smileBuilder().startObject() .field("file").value(bytes) .endObject() or val json = XContentFactory.smileBuilder().startObject() .field("file").startObject() .field("content").value(bytes) .endObject() .endObject() ES throws org.elasticsearch.index.mapper.MapperParsingException: No content is provided. at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:337) at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616) at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:469) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462) at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371) at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:215) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) I am using SMILE so I am curious if there is some workaround or am I doing something wrong? Thanks in advance Paweł Młynarczyk -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59a46c68-8703-4254-9869-1ac6e1bac883%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Attachment plugin
Thank you very much for your answer. As I can see, you are decoding the whole elasticsearch response. I would like to do some streaming decoding, so I don't have to waste memory to store the whole file. Is that possible with elasticsearch? W dniu poniedziałek, 17 marca 2014 11:14:56 UTC+1 użytkownik David Pilato napisał: > > This is somehow what we did in scrutmydocs.org project. > It's not a plugin though. > > -- > David ;-) > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > > > Le 17 mars 2014 à 10:51, Paweł Młynarczyk > > a écrit : > > Hello > > I've found elasticsearch-mapper-attachments plugin already, but I was > wandering if there was a plugin that allows users to access indexed files > via URL file. I mean, that I'd like to download files directly from > Elasticsearch using some special URL. Any thoughts on this? > > Best regards > > Paweł Młynarczyk > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearc...@googlegroups.com . > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/675d3bb1-dfcc-4c4c-b498-7f933f7a58d2%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/675d3bb1-dfcc-4c4c-b498-7f933f7a58d2%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb2a9aed-fcb1-4a79-9fa8-839a0abcdb15%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Attachment plugin
Thank you very much for the answer. W dniu poniedziałek, 17 marca 2014 10:51:57 UTC+1 użytkownik Paweł Młynarczyk napisał: > > Hello > > I've found elasticsearch-mapper-attachments plugin already, but I was > wandering if there was a plugin that allows users to access indexed files > via URL file. I mean, that I'd like to download files directly from > Elasticsearch using some special URL. Any thoughts on this? > > Best regards > > Paweł Młynarczyk > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ef65d223-52ae-4d32-8276-583279c22e13%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Attachment plugin
Hello I've found elasticsearch-mapper-attachments plugin already, but I was wandering if there was a plugin that allows users to access indexed files via URL file. I mean, that I'd like to download files directly from Elasticsearch using some special URL. Any thoughts on this? Best regards Paweł Młynarczyk -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/675d3bb1-dfcc-4c4c-b498-7f933f7a58d2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Custom _all field and per field boosting
Hello I was really excited to hear that custom _all fields are coming to town, but the removal of per field boosting messed up my idea. All I want to achieve is to create multiple _all-like fields consisting of different fields (each _all-like field would gather data in different language) and to query against them. I also badly need to boost some of the fields (e.g. for books, title is far more important then the publisher). Each of those fields would gather about 10 different fields. My current solution is to store different languages in different indices but that solution makes me duplicate a lot of language independant data. As stated in https://github.com/elasticsearch/elasticsearch/pull/4972 I tried to use multi_match or even string query with multiple fields but the performance of that kind of query is nowhere close to querying the _all field itself. Are per field boosting for custom _all fields planned to be reimplemented? Will there be a significant improvement to how cross field multi_match performs? Could you think of another approach that could be worth trying? I'd be grateful for any hints. Best regards, Paweł Młynarczyk -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52067048-8eea-4e96-b285-c05f4a134dc6%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.