Function score query does NOT replace boost field well - how to access child's fields?!

2014-05-13 Thread Paweł Młynarczyk
Hello

Boost field is deprecated since ES 1.0.0 so I've decided to change it to 
the function score query as suggested in the docs. 

So I tried to change this:

https://gist.github.com/zwrss/aeaf2828f6dd35a1e888#file-boost-field

Into this:

https://gist.github.com/zwrss/aeaf2828f6dd35a1e888#file-function-score

The problem is that i can't use child's fields inside the top_children 
query. How should I handle this?

Thanks in advance

Paweł Młynarczyk

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d68bab8-30cf-4382-bf78-36753b9f3ca6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: filtered query vs query performance

2014-04-07 Thread Paweł Młynarczyk
I've tried to query the children first and then do some ID filtering on 
parents but the scoring is then screwed and the performance is even worse 
(as expected). I still have not found any satisfying solution to this 
matter.

W dniu czwartek, 3 kwietnia 2014 14:44:37 UTC+2 użytkownik Paweł Młynarczyk 
napisał:
>
> Hello
>
> I have a parent-child index in my app (mapping gist: 
> https://gist.github.com/zwrss/9953291#file-mapping). I have about 250.000 
> parents and 7.000.000 childrens.
> I was trying to speed up my top_children query (gist: 
> https://gist.github.com/zwrss/9953291#file-parent-query) a bit, so I 
> tried adding some filters to ensure that the query have to score only about 
> 300 parents (gist: 
> https://gist.github.com/zwrss/9953291#file-filtered-parent-query).
> The effect was surprising - the original query was executing in ~150 ms 
> and the filtered one in ~270 ms. 
> I thought that adding some filters to ease the scoring process would help. 
> I thought that te top_children is the thing here so I decided to do some 
> testing and tried to query only Children and add some filters to narrow the 
> results.
> The original query I tried (gist: 
> https://gist.github.com/zwrss/9953291#file-child-query) executed in ~90 
> ms and the filtered one (gist: 
> https://gist.github.com/zwrss/9953291#file-filtered-child-query) in ~120 
> ms.
>
> Is that the correct behaviour? Am I missing something?
>
> best regards
> Paweł Młynarczyk
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b117c316-8ab2-4618-b65c-32c77b42ecf1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sub-aggregations not working as expected

2014-04-06 Thread Paweł Młynarczyk
Hello Chris

ElasticSearch does not recognize your 'recipe' list as a list of logically 
connected fields. So when your top level aggregation returns i.e. rock, 
your sub-aggregation does not count stats based on the rock material, but 
based on all the materials that are included in the documents that also 
include rock.

You could try to index those materials as child documents for the main 
index and then just do aggs over the child type.
You may also want to read this:
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/

Best regards
Paweł Młynarczyk

W dniu poniedziałek, 7 kwietnia 2014 02:16:38 UTC+2 użytkownik chris Hahn 
napisał:
>
> Playing with elasticsearch for a project.
> This is sample data, every document will have a list of ingredients, some 
> of the ingredients may be used in different documents and in different 
> quantities.  I would like an aggregation that lists all ingredients for a 
> search, and the stats of each ingredient.
>
> I'm looking to this group for suggestions, can I structure my query 
> different to get the results I would like (or should I structure my 
> document different?).
> Basically I have two constraints:  I don't know what the ingredients are 
> when the query is written.  I would like to list all ingredients, and the 
> average amounts of each.
>
> Sample data:
>{
>"ingredients": [
>   {
>  "name": "Rock",
>  "quantity": 6,
>  "unit": "lb"
>   },
>   {
>  "name": "Dirt",
>  "quantity": 6,
>  "unit": "lb"
>   },
>   {
>  "name": "Mortar",
>  "quantity": 3,
>  "unit": "lb"
>   }
>]
> }
>
> This query looks like it works, but doesn't.  I'm quite confused as to 
> where these numbers are coming from .
> Query:
> POST /concrete/recipe/_search
> {
> "query" : {"match_all" : {}},
> "aggs" : {
> "ingredients" : {
> "terms" : {
> "field" : "ingredients.name"
> },
> "aggs" : {
> "pounds" : { "stats" : { "field" : "ingredients.quantity" 
> } }
> }
> }
> }
> }
>
> Results:
> {
>"took": 4,
>"timed_out": false,
>"_shards": {
>   "total": 1,
>   "successful": 1,
>   "failed": 0
>},
>"hits": {
>   "total": 2,
>   "max_score": 1,
>   "hits": [
>  {
> "_index": "concrete",
> "_type": "recipe",
> "_id": "1",
> "_score": 1,
> "_source": {
>"ingredients": [
>   {
>  "name": "Rock",
>  "quantity": 6,
>  "unit": "lb"
>   },
>   {
>  "name": "Dirt",
>  "quantity": 6,
>  "unit": "lb"
>   },
>   {
>  "name": "Mortar",
>  "quantity": 3,
>  "unit": "lb"
>   }
>]
> }
>  },
>  {
> "_index": "concrete",
> "_type": "recipe",
> "_id": "2",
> "_score": 1,
> "_source": {
>"ingredients": [
>   {
>  "name": "Rock",
>  "quantity": 8,
>  "unit": "lb"
>   },
>   {
>  "name": "Quartz",
>  "quantity": 0.5,
>  "unit": "l

Re: filtered query vs query performance

2014-04-03 Thread Paweł Młynarczyk
Update: 
Filtered Aliases allows me to speed up the query (about 50%). The thing is 
I can't use aliases since the filter I am applying is not static. Why does 
filtered alias perform so much better then filtered query?

W dniu czwartek, 3 kwietnia 2014 14:44:37 UTC+2 użytkownik Paweł Młynarczyk 
napisał:
>
> Hello
>
> I have a parent-child index in my app (mapping gist: 
> https://gist.github.com/zwrss/9953291#file-mapping). I have about 250.000 
> parents and 7.000.000 childrens.
> I was trying to speed up my top_children query (gist: 
> https://gist.github.com/zwrss/9953291#file-parent-query) a bit, so I 
> tried adding some filters to ensure that the query have to score only about 
> 300 parents (gist: 
> https://gist.github.com/zwrss/9953291#file-filtered-parent-query).
> The effect was surprising - the original query was executing in ~150 ms 
> and the filtered one in ~270 ms. 
> I thought that adding some filters to ease the scoring process would help. 
> I thought that te top_children is the thing here so I decided to do some 
> testing and tried to query only Children and add some filters to narrow the 
> results.
> The original query I tried (gist: 
> https://gist.github.com/zwrss/9953291#file-child-query) executed in ~90 
> ms and the filtered one (gist: 
> https://gist.github.com/zwrss/9953291#file-filtered-child-query) in ~120 
> ms.
>
> Is that the correct behaviour? Am I missing something?
>
> best regards
> Paweł Młynarczyk
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ab9ae1dc-2d34-43d4-bd08-3919acd3abc8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


filtered query vs query performance

2014-04-03 Thread Paweł Młynarczyk
Hello

I have a parent-child index in my app (mapping gist: 
https://gist.github.com/zwrss/9953291#file-mapping). I have about 250.000 
parents and 7.000.000 childrens.
I was trying to speed up my top_children query (gist: 
https://gist.github.com/zwrss/9953291#file-parent-query) a bit, so I tried 
adding some filters to ensure that the query have to score only about 300 
parents (gist: 
https://gist.github.com/zwrss/9953291#file-filtered-parent-query).
The effect was surprising - the original query was executing in ~150 ms and 
the filtered one in ~270 ms. 
I thought that adding some filters to ease the scoring process would help. 
I thought that te top_children is the thing here so I decided to do some 
testing and tried to query only Children and add some filters to narrow the 
results.
The original query I tried (gist: 
https://gist.github.com/zwrss/9953291#file-child-query) executed in ~90 ms 
and the filtered one (gist: 
https://gist.github.com/zwrss/9953291#file-filtered-child-query) in ~120 ms.

Is that the correct behaviour? Am I missing something?

best regards
Paweł Młynarczyk

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e3cdf588-a0f2-4911-9143-59dd926469c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Thread Pools and Queues

2014-03-25 Thread Paweł Młynarczyk
Thank you for your answer.

I did some tests and it appears that the lowest queue size between the data 
nodes is the effective one (as far as I am concerned the search requests 
are spread across all nodes with data) and queue sizes in non data nodes 
are ignored (I guess that it is not ignored but redirecting search requests 
to data nodes omits storing them in queue). So the given solution is not 
working. Or am I missing something?

Paweł Młynarczyk

W dniu poniedziałek, 24 marca 2014 15:17:22 UTC+1 użytkownik Jörg Prante 
napisał:
>
> You could set up a special node for "low priority" search, with "slim" 
> thread pool settings, and forward client calls to it respectively.
>
> Jörg
>
>
> On Mon, Mar 24, 2014 at 3:08 PM, Paweł Młynarczyk 
> 
> > wrote:
>
>>  There are many users working with the application. Some of them are 
>> online users, that should have their requests served in real time and some 
>> of them are workers that are preparing reports and their requests can wait. 
>> At the moment, all the requests arriving to elasticsearch are queued at the 
>> same queue and that sometimes results in online users not getting their 
>> request served as soon as possible or even getting rejected when the queue 
>> is full of workers' requests. That could be solved by adding separate 
>> Thread Pool and Queue for 'low priority search'. Same would apply for other 
>> operations (get, bulk etc). 
>>
>> Are there any plans or any chance for this kind of feature being 
>> implemented in Elasticsearch?
>>
>> -- 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3bdf14af-b2c0-4153-9f49-c7f5ad737e11%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Thread Pools and Queues

2014-03-24 Thread Paweł Młynarczyk
 

There are many users working with the application. Some of them are online 
users, that should have their requests served in real time and some of them 
are workers that are preparing reports and their requests can wait. At the 
moment, all the requests arriving to elasticsearch are queued at the same 
queue and that sometimes results in online users not getting their request 
served as soon as possible or even getting rejected when the queue is full 
of workers' requests. That could be solved by adding separate Thread Pool 
and Queue for 'low priority search'. Same would apply for other operations 
(get, bulk etc). 

Are there any plans or any chance for this kind of feature being 
implemented in Elasticsearch?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14bb9a8c-5d74-4a0e-a155-1178ce2be25b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Mapper Attachments Plugin and SMILE

2014-03-20 Thread Paweł Młynarczyk
Hello

I've got a problem working with Mapper Attachments Plugin.

This code


val json = XContentFactory.jsonBuilder().startObject()
.field("file").startObject()
  .field("content").value(Base64.encodeBytes(bytes))
.endObject()
  .endObject()

client.prepareIndex(indexName, typeName)
  .setSource(json).execute().actionGet()


works perfectly well, but when I change jsonBuilder to smileBuilder, ES 
throws

org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:540)
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371)
at 
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:215)
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: 
Current token (VALUE_STRING) not VALUE_EMBEDDED_OBJECT, can not access as 
binary
 at [Source: [B@3cb66faa; line: -1, column: 98325]
at 
org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1524)
at 
org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:557)
at 
org.elasticsearch.common.jackson.dataformat.smile.SmileParser.getBinaryValue(SmileParser.java:1212)
at 
org.elasticsearch.common.jackson.core.JsonParser.getBinaryValue(JsonParser.java:1131)
at 
org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue(JsonXContentParser.java:183)
at 
org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:321)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
... 8 more

and when I try not to encode the file

val json = XContentFactory.smileBuilder().startObject()
.field("file").value(bytes)
  .endObject()

or

val json = XContentFactory.smileBuilder().startObject()
.field("file").startObject()
  .field("content").value(bytes)
.endObject()
  .endObject()

ES throws 

org.elasticsearch.index.mapper.MapperParsingException: No content is 
provided.
at 
org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:337)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:469)
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371)
at 
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:215)
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)




I am using SMILE so I am curious if there is some workaround or am I doing 
something wrong?

Thanks in advance

Paweł Młynarczyk

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/59a46c68-8703-4254-9869-1ac6e1bac883%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Attachment plugin

2014-03-17 Thread Paweł Młynarczyk
Thank you very much for your answer.

As I can see, you are decoding the whole elasticsearch response. I would 
like to do some streaming decoding, so I don't have to waste memory to 
store the whole file. Is that possible with elasticsearch?

W dniu poniedziałek, 17 marca 2014 11:14:56 UTC+1 użytkownik David Pilato 
napisał:
>
> This is somehow what we did in scrutmydocs.org project.
> It's not a plugin though.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 17 mars 2014 à 10:51, Paweł Młynarczyk > 
> a écrit :
>
> Hello
>
> I've found elasticsearch-mapper-attachments plugin already, but I was 
> wandering if there was a plugin that allows users to access indexed files 
> via URL file. I mean, that I'd like to download files directly from 
> Elasticsearch using some special URL. Any thoughts on this?
>
> Best regards
>
> Paweł Młynarczyk
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/675d3bb1-dfcc-4c4c-b498-7f933f7a58d2%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/675d3bb1-dfcc-4c4c-b498-7f933f7a58d2%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb2a9aed-fcb1-4a79-9fa8-839a0abcdb15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Attachment plugin

2014-03-17 Thread Paweł Młynarczyk
Thank you very much for the answer.

W dniu poniedziałek, 17 marca 2014 10:51:57 UTC+1 użytkownik Paweł 
Młynarczyk napisał:
>
> Hello
>
> I've found elasticsearch-mapper-attachments plugin already, but I was 
> wandering if there was a plugin that allows users to access indexed files 
> via URL file. I mean, that I'd like to download files directly from 
> Elasticsearch using some special URL. Any thoughts on this?
>
> Best regards
>
> Paweł Młynarczyk
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef65d223-52ae-4d32-8276-583279c22e13%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Attachment plugin

2014-03-17 Thread Paweł Młynarczyk
Hello

I've found elasticsearch-mapper-attachments plugin already, but I was 
wandering if there was a plugin that allows users to access indexed files 
via URL file. I mean, that I'd like to download files directly from 
Elasticsearch using some special URL. Any thoughts on this?

Best regards

Paweł Młynarczyk

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/675d3bb1-dfcc-4c4c-b498-7f933f7a58d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Custom _all field and per field boosting

2014-02-21 Thread Paweł Młynarczyk
Hello

I was really excited to hear that custom _all fields are coming to town, 
but the removal of per field boosting messed up my idea. 
All I want to achieve is to create multiple _all-like fields consisting of 
different fields (each _all-like field would gather data in different 
language) and to query against them. I also badly need to boost some of the 
fields (e.g. for books, title is far more important then the publisher). Each 
of those fields would gather about 10 different fields. My current solution 
is to store different languages in different indices but that solution 
makes me duplicate a lot of language independant data. As stated in 
https://github.com/elasticsearch/elasticsearch/pull/4972 I tried to use 
multi_match or even string query with multiple fields but the performance 
of that kind of query is nowhere close to querying the _all field itself.

Are per field boosting for custom _all fields planned to be reimplemented? 
Will there be a significant improvement to how cross field multi_match 
performs? 
Could you think of another approach that could be worth trying?

I'd be grateful for any hints.


Best regards,

Paweł Młynarczyk

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52067048-8eea-4e96-b285-c05f4a134dc6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.