Re: Looking for a best practice to get all data according to some filters

2014-12-11 Thread Dani Castro
Hi,
  I am facing the same situation:
We would like to get all the ids of the documents matching certain 
criteria. In the worst case (which is the one I am exposing here), the 
documents matching the criteria would be around 200K, and in our first 
tests it is really slow (around 15 seconds). However, if we do the same 
query just for count documents, ES replies in just 10-15ms, which is 
amazing.
I suspect that the problem is on the transport layer and the latency 
generated by transferring a big JSON result. 

Would you recommend, in a situation like this, to use another transport 
layer like Thirf or a custom solution?.

Thanks in advance

El jueves, 11 de diciembre de 2014 14:00:05 UTC+1, Ron Sher escribió:
>
> Just tested this.
> When I used a large number to get all of my documents according to some 
> criteria (4926 in the result) I got:
> 13.951s when using a size of 1M
> 43.6s when using scan/scroll (with a size of 100)
>
> Looks like I should be using the not recommended paging.
> Can I make the scroll better?
>
> Thanks,
> Ron
>
> On Wednesday, December 10, 2014 10:53:50 PM UTC+2, David Pilato wrote:
>>
>> No I did not say that. Or I did not mean that. Sorry if it was unclear.
>> I said: don’t use large sizes:
>>
>> Never use size:1000 or from:1000. 
>>>
>>
>> You should read this: 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> *
>> @dadoonet  | @elasticsearchfr 
>>  | @scrutmydocs 
>> 
>>
>>
>>  
>> Le 10 déc. 2014 à 21:16, Ron Sher  a écrit :
>>
>> So you're saying there's no impact on elasticsearch if I issue a large 
>> size? 
>> If that's the case then why shouldn't I just call size of 1M if I want to 
>> make sure I get everything?
>>
>> On Wednesday, December 10, 2014 8:22:47 PM UTC+2, David Pilato wrote:
>>>
>>> Scan/scroll is the best option to extract a huge amount of data.
>>> Never use size:1000 or from:1000. 
>>>
>>> It's not realtime because you basically scroll over a given set of 
>>> segments and all new changes that will come in new segments won't be taken 
>>> into account during the scroll.
>>> Which is good because you won't get inconsistent results.
>>>
>>> About size, I'd would try and test. It depends on your docs size I 
>>> believe.
>>> Try with 1 and see how it goes when you increase it. You will may be 
>>> discover that getting 10*1 docs is the same as 1*10. :)
>>>
>>> Best
>>>
>>> David
>>>
>>> Le 10 déc. 2014 à 19:09, Ron Sher  a écrit :
>>>
>>> Hi,
>>>
>>> I was wondering about best practices to to get all data according to 
>>> some filters.
>>> The options as I see them are:
>>>
>>>- Use a very big size that will return all accounts, i.e. use some 
>>>value like 1m to make sure I get everything back (even if I need just a 
>>> few 
>>>hundreds or tens of documents). This is the quickest way, development 
>>> wise.
>>>- Use paging - using size and from. This requires looping over the 
>>>result and the performance gets worse as we advance to later pages. 
>>> Also, 
>>>we need to use preference if we want to get consistent results over the 
>>>pages. Also, it's not clear what's the recommended size for each page.
>>>- Use scan/scroll - this gives consistent paging but also has 
>>>several drawbacks: If I use search_type=scan then it can't be sorted; 
>>> using 
>>>scan/scroll is (maybe) less performant than paging (the documentation 
>>> says 
>>>it's not for realtime use); again not clear which size is recommended.
>>>
>>> So you see - many options and not clear which path to take.
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> Ron
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%40googlegroups.com
>>  
>> 

Re: Adding a field dinamically to search results

2014-09-26 Thread Dani Castro
Nobody has faced something like this before? :(


El jueves, 25 de septiembre de 2014 18:33:08 UTC+2, Dani Castro escribió:
>
> Hi, 
>
>  I am a newbie into ElasticSearch and I am trying to figure out how to 
> achieve this:
>
> In our Elastic Search Cluster we have documents like this:
>
> {
> "hotel" : "Hilton Maldives",
> "location" : ...
> }'
>
>
> and what we want to achive is that when we search in Elastic Search, it 
> replies with this:
>
>
> {
> "hotel" : "Hilton Maldives",
> "price" : 100$
> "location" : ...
> }'
>
>
> this dynamic or synthetic field will come from an external service.
>
> Of course, we can achieve this after the retrieval of all the hits of a 
> search, but for us the performance is a must, and we would like to add this 
> while the ES cluster is retrieving the items: that means for example, that 
> with 100 documents on 4 shards, distributed in 4 servers, each server will 
> add the "price" field to 25 documents (on an hypotetic match of the 100 
> documents)
>
> As I understand, we need to create a plugin for Elastic Search, but after 
> two days looking where and how, I have not found any clear reference on how 
> to do it.
>
> I would really appreciate any suggestion or help to this.
>
> Thanks in advance!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9b17199-447a-4146-b2ea-3481a60e4848%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Adding a field dinamically to search results

2014-09-25 Thread Dani Castro
Hi, 

 I am a newbie into ElasticSearch and I am trying to figure out how to 
achieve this:

In our Elastic Search Cluster we have documents like this:

{
"hotel" : "Hilton Maldives",
"location" : ...
}'


and what we want to achive is that when we search in Elastic Search, it 
replies with this:


{
"hotel" : "Hilton Maldives",
"price" : 100$
"location" : ...
}'


this dynamic or synthetic field will come from an external service.

Of course, we can achieve this after the retrieval of all the hits of a 
search, but for us the performance is a must, and we would like to add this 
while the ES cluster is retrieving the items: that means for example, that 
with 100 documents on 4 shards, distributed in 4 servers, each server will 
add the "price" field to 25 documents (on an hypotetic match of the 100 
documents)

As I understand, we need to create a plugin for Elastic Search, but after 
two days looking where and how, I have not found any clear reference on how 
to do it.

I would really appreciate any suggestion or help to this.

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58ea0a0f-2216-4de2-a196-3fa11c44a5bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.