Hi Kimchy, I rerun the benchmark using ES1.3 with default settings (just disable the _source & _all ) and it makes a great progress on the performance. However Solr still outperforms ES 1.3: Number of different meta data field ES ES with disable _all/codec bloom filter
*ES 1.3 * Solr Scenario 0: 1000 12secs -> *833*docs/sec CPU: 30.24% Heap: 1.08G time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1 *index size: 36Mb* iowait: 0.02% 13 secs ->769 docs/sec CPU: 23.68% iowait: 0.01% Heap: 1.31G Index Size: 248K Ingestion speed change: 2 1 1 1 1 1 1 1 2 1 13 secs->769 docs/sec CPU: 44.22% iowait: 0.01% Heap: 1.38G Index Size: 69M Ingestion speed change: 2 1 1 1 1 1 2 0 2 2 13 secs -> 769 docs/sec CPU: 28.85% Heap: 9.39G time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2 Scenario 1: 10k 29secs -> *345*docs/sec CPU: 40.83% Heap: 5.74G time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1 iowait: 0.02% *Index Size: 36Mb* 31 secs -> 322.6 docs/sec CPU: 39.29% iowait: 0.01% Heap: 4.76G Index Size: 396K Ingestion speed change: 12 1 2 1 1 1 2 1 4 2 20 secs->500 docs/sec CPU: 54.74% iowait: 0.02% Heap: 3.06G Index Size: 133M Ingestion speed change: 2 2 1 2 2 3 2 2 2 1 12 secs -> 833 docs/sec CPU: 28.62% Heap: 9.88G time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 Scenario 2: 100k 17 mins 44 secs -> *9.4*docs/sec CPU: 54.73% Heap: 47.99G time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40 iowait: 0.02% *Index Size: 75Mb* 14 mins 24 secs -> 11.6 docs/sec CPU: 52.30% iowait: 0.02% Heap: Index Size: 1.5M Ingestion speed change: 93 153 151 112 84 65 61 53 51 41 1 mins 24 secs-> 119 docs/sec CPU: 47.67% iowait: 0.12% Heap: 8.66G Index Size: 163M Ingestion speed change: 9 14 12 12 8 8 5 7 5 4 13 secs -> 769 docs/sec CPU: 29.43% Heap: 9.84G time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2 Scenario 3: 1M 183 mins 8 secs -> *0.9* docs/sec CPU: 40.47% Heap: 47.99G time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594 11 mins 9 secs->15docs/sec CPU: 41.45% iowait: 0.07% Heap: 36.12G Index Size: 163M Ingestion speed change: 12 24 38 55 70 86 106 117 83 78 15 secs -> 666.7 docs/sec CPU: 45.10% Heap: 9.64G time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2 Best Regards Maco On Saturday, July 5, 2014 11:46:59 PM UTC+8, kimchy wrote: > > Heya, I worked a bit on it, and 1.x (upcoming 1.3) has some significant > perf improvements now for this case (including improvements Lucene wise, > that are for now in ES, but will be in Lucene next version). Those include: > > 6648: https://github.com/elasticsearch/elasticsearch/pull/6648 > 6714: https://github.com/elasticsearch/elasticsearch/pull/6714 > 6707: https://github.com/elasticsearch/elasticsearch/pull/6707 > > It would be interesting if you can run the tests again with 1.x branch. > Note, also, please use default features in ES for now, no disable flushing > and such. > > On Friday, June 13, 2014 7:57:23 AM UTC+2, Maco Ma wrote: >> >> I try to measure the performance of ingesting the documents having lots >> of fields. >> >> >> The latest elasticsearch 1.2.1: >> Total docs count: 10k (a small set definitely) >> ES_HEAP_SIZE: 48G >> settings: >> >> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}} >> >> mappings: >> >> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}} >> >> All fields in the documents mach the templates in the mappings. >> >> Since I disabled the flush & refresh, I submitted the flush command >> (along with optimize command after it) in the client program every 10 >> seconds. (I tried the another interval 10mins and got the similar results) >> >> Scenario 0 - 10k docs have 1000 different fields: >> Ingestion took 12 secs. Only 1.08G heap mem is used(only states the used >> heap memory). >> >> >> Scenario 1 - 10k docs have 10k different fields(10 times fields compared >> with scenario0): >> This time ingestion took 29 secs. Only 5.74G heap mem is used. >> >> Not sure why the performance degrades sharply. >> >> If I try to ingest the docs having 100k different fields, it will take 17 >> mins 44 secs. We only have 10k docs totally and not sure why ES perform so >> badly. >> >> Anyone can give suggestion to improve the performance? >> >> >> >> >> >> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a2572a6-c97d-47f5-a801-b1d933c22990%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.