I used the curl command to do the ingestion(one command, one doc) and 
flush. I also tried the Solr(disabled the soft/hard commit & do the commit 
with client program) with the same data & commands and its performance did 
not degrade. Lucene are used for both of them and not sure why there is a 
big difference with the performances. 

On Friday, June 13, 2014 2:02:58 PM UTC+8, Mark Walkom wrote:
>
> It's not surprising that the time increases when you have an order of 
> magnitude more fields.
>
> Are you using the bulk API?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com <javascript:>
> web: www.campaignmonitor.com
>  
>
> On 13 June 2014 15:57, Maco Ma <mayao...@gmail.com <javascript:>> wrote:
>
>> I try to measure the performance of ingesting the documents having lots 
>> of fields.
>>
>>
>> The latest elasticsearch 1.2.1:
>> Total docs count: 10k (a small set definitely)
>> ES_HEAP_SIZE: 48G
>> settings:
>>
>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}}
>>
>> mappings:
>>
>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}}
>>
>> All fields in the documents mach the templates in the mappings.
>>
>> Since I disabled the flush & refresh, I submitted the flush command 
>> (along with optimize command after it) in the client program every 10 
>> seconds. (I tried the another interval 10mins and got the similar results)
>>
>> Scenario 0 - 10k docs have 1000 different fields:
>> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
>> heap memory).
>>
>>
>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
>> with scenario0):
>> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>>
>> Not sure why the performance degrades sharply.
>>
>> If I try to ingest the docs having 100k different fields, it will take 17 
>> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
>> badly. 
>>
>> Anyone can give suggestion to improve the performance?
>>
>>
>>
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8694a4da-68f6-40b3-9d40-fbbc63041cad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to