Re: ingest performance degrades sharply along with the documents having more fileds

kimchy Tue, 08 Jul 2014 08:11:37 -0700

Yes, this is the equivalent of using RAMDirectory. Please, don't use this, 
Mmap is optimized for random access and if the lucene index can fit in heap 
(to use ram dir), it can certainly fit in OS RAM, without the implications 
of loading it to heap.


On Monday, July 7, 2014 6:26:07 PM UTC+2, Mahesh Venkat wrote:
>
> Thanks Shay for updating us with perf improvements.
> Apart from using the default parameters, should we follow the guideline 
> listed in 
>
>
> http://elasticsearch-users.115913.n3.nabble.com/Is-ES-es-index-store-type-memory-equivalent-to-Lucene-s-RAMDirectory-td4057417.html
>  
>
> Lucene supports MMapDirectory at the data indexing phase (in a batch) and 
> switch to in-memory for queries to optimize on search latency.
>
> Should we use JVM system parameter -Des.index.store.type=memory .  Isn't 
> this equivalent to using RAMDirectory in Lucene for in-memory search query 
>  ?
> Thanks
> --Mahesh
>
> On Saturday, July 5, 2014 8:46:59 AM UTC-7, kimchy wrote:
>>
>> Heya, I worked a bit on it, and 1.x (upcoming 1.3) has some significant 
>> perf improvements now for this case (including improvements Lucene wise, 
>> that are for now in ES, but will be in Lucene next version). Those include:
>>
>> 6648: https://github.com/elasticsearch/elasticsearch/pull/6648
>> 6714: https://github.com/elasticsearch/elasticsearch/pull/6714
>> 6707: https://github.com/elasticsearch/elasticsearch/pull/6707
>>
>> It would be interesting if you can run the tests again with 1.x branch. 
>> Note, also, please use default features in ES for now, no disable flushing 
>> and such.
>>
>> On Friday, June 13, 2014 7:57:23 AM UTC+2, Maco Ma wrote:
>>>
>>> I try to measure the performance of ingesting the documents having lots 
>>> of fields.
>>>
>>>
>>> The latest elasticsearch 1.2.1:
>>> Total docs count: 10k (a small set definitely)
>>> ES_HEAP_SIZE: 48G
>>> settings:
>>>
>>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}}
>>>
>>> mappings:
>>>
>>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}}
>>>
>>> All fields in the documents mach the templates in the mappings.
>>>
>>> Since I disabled the flush & refresh, I submitted the flush command 
>>> (along with optimize command after it) in the client program every 10 
>>> seconds. (I tried the another interval 10mins and got the similar results)
>>>
>>> Scenario 0 - 10k docs have 1000 different fields:
>>> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the 
>>> used heap memory).
>>>
>>>
>>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
>>> with scenario0):
>>> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>>>
>>> Not sure why the performance degrades sharply.
>>>
>>> If I try to ingest the docs having 100k different fields, it will take 
>>> 17 mins 44 secs.  We only have 10k docs totally and not sure why ES perform 
>>> so badly. 
>>>
>>> Anyone can give suggestion to improve the performance?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/450fdf38-bdfe-49c2-9938-627b9854892c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ingest performance degrades sharply along with the documents having more fileds

Reply via email to