I used the curl command to do the ingestion(one command, one doc) and flush. I also tried the Solr(disabled the soft/hard commit & do the commit with client program) with the same data & commands and its performance did not degrade. Lucene are used for both of them and not sure why there is a big difference with the performances.
On Friday, June 13, 2014 2:02:58 PM UTC+8, Mark Walkom wrote: > > It's not surprising that the time increases when you have an order of > magnitude more fields. > > Are you using the bulk API? > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: ma...@campaignmonitor.com <javascript:> > web: www.campaignmonitor.com > > > On 13 June 2014 15:57, Maco Ma <mayao...@gmail.com <javascript:>> wrote: > >> I try to measure the performance of ingesting the documents having lots >> of fields. >> >> >> The latest elasticsearch 1.2.1: >> Total docs count: 10k (a small set definitely) >> ES_HEAP_SIZE: 48G >> settings: >> >> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}} >> >> mappings: >> >> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}} >> >> All fields in the documents mach the templates in the mappings. >> >> Since I disabled the flush & refresh, I submitted the flush command >> (along with optimize command after it) in the client program every 10 >> seconds. (I tried the another interval 10mins and got the similar results) >> >> Scenario 0 - 10k docs have 1000 different fields: >> Ingestion took 12 secs. Only 1.08G heap mem is used(only states the used >> heap memory). >> >> >> Scenario 1 - 10k docs have 10k different fields(10 times fields compared >> with scenario0): >> This time ingestion took 29 secs. Only 5.74G heap mem is used. >> >> Not sure why the performance degrades sharply. >> >> If I try to ingest the docs having 100k different fields, it will take 17 >> mins 44 secs. We only have 10k docs totally and not sure why ES perform so >> badly. >> >> Anyone can give suggestion to improve the performance? >> >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8694a4da-68f6-40b3-9d40-fbbc63041cad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.