I run the benchmark where search and ingest runs concurrently. Paste the results here: Number of different meta data field ES with disable _all/codec bloom filter ES disabled params (Ingestion & Query concurrently) Scenario 0: 1000 13 secs ->769 docs/sec CPU: 23.68% iowait: 0.01% Heap: 1.31G Index Size: 248K Ingestion speed change: 2 1 1 1 1 1 1 1 2 1 14 secs ->714 docs/sec CPU: 27.51% iowait: 0.03% Heap: 1.27G Index Size: 304K Ingestion speed change: 3 1 1 1 1 1 1 2 2 1 Scenario 1: 10k 31 secs -> 322.6 docs/sec CPU: 39.29% iowait: 0.01% Heap: 4.76G Index Size: 396K Ingestion speed change: 12 1 2 1 1 1 2 1 4 2
35 secs -> 285docs/sec CPU: 42.46% iowait: 0.01% Heap: 5.14G Index Size: 336K Ingestion speed change: 13 2 1 1 2 1 1 4 1 2 I added one more thread to do the query to the existing ingestion script: sub query { my $qstr = q(curl -s 'http://localhost:9200/doc/type/_search' -d'{"query":{"filtered":{"query":{"query_string":{"fields" : ["); my $fstr = q(curl -s 'http://localhost:9200/doc/type/_search' -d'{"query":{"filtered":{"query":{"match_all":{}},"filter":{"); my $fieldNum = 1000; while ($no < $total ) { $tr= int(rand(5)); if( $tr == 0 ) { $fieldName = "field".int(rand($fieldNum))."_i"; $fieldValue = "*1*"; } elsif ($tr == 1) { $fieldName = "field".int(rand($fieldNum))."_dt"; $fieldValue = "*2*"; } else { $fieldName = "field".int(rand($fieldNum))."_ss"; $fieldValue = "f*"; } $cstr = $qstr. "$fieldName" . q("],"query":") . $fieldValue . q("}}}}}'); print $cstr."\n"; print `$cstr`."\n"; $tr= int(rand(5)); if( $tr == 0 ) { $cstr = $fstr. q(range":{ "field).int(rand($fieldNum)).q(_i":{"gte":). int(rand(1000)). q(}}}}}}'); } elsif ($tr == 1) { $cstr = $fstr. q(range":{ "field). int(rand($fieldNum)).q(_dt":{"from": "2010-01-).(1+int(rand(31))).q(T02:10:03"}}}}}}'); } else { $cstr = $fstr. q(regexp":{"field).int(rand($fieldNum)).q(_ss":"f.*"}}}}}'); } print $cstr."\n"; print `$cstr`."\n"; } } Maco On Wednesday, June 25, 2014 1:04:08 AM UTC+8, Cindy Hsin wrote: > > Looks like the memory usage increased a lot with 10k fields with these two > parameter disabled. > > Based on the experiment we have done, looks like ES have abnormal memory > usage and performance degradation when number of fields are large (ie. > 10k). Where Solr memory usage and performance remains for the large number > fields. > > If we are only looking at 10k fields scenario, is there a way for ES to > make the ingest performance better (perhaps via a bug fix)? Looking at the > performance number, I think this abnormal memory usage & performance drop > is most likely a bug in ES layer. If this is not technically feasible then > we'll report back that we have checked with ES experts and confirmed that > there is no way for ES to provide a fix to address this issue. The solution > Mike suggestion sounds like a workaround (ie combine multiple fields into > one field to reduce the large number of fields). I can run it by our team > but not sure if this will fly. > > I have also asked Maco to do one more benchmark (where search and ingest > runs concurrently) for both ES and Solr to check whether there is any > performance degradation for Solr when search and ingest happens > concurrently. I think this is one point that Mike mentioned, right? Even > with Solr, you think we will hit some performance issue with large fields > when ingest and query runs concurrently. > > Thanks! > Cindy > > On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote: >> >> I try to measure the performance of ingesting the documents having lots >> of fields. >> >> >> The latest elasticsearch 1.2.1: >> Total docs count: 10k (a small set definitely) >> ES_HEAP_SIZE: 48G >> settings: >> >> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}} >> >> mappings: >> >> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}} >> >> All fields in the documents mach the templates in the mappings. >> >> Since I disabled the flush & refresh, I submitted the flush command >> (along with optimize command after it) in the client program every 10 >> seconds. (I tried the another interval 10mins and got the similar results) >> >> Scenario 0 - 10k docs have 1000 different fields: >> Ingestion took 12 secs. Only 1.08G heap mem is used(only states the used >> heap memory). >> >> >> Scenario 1 - 10k docs have 10k different fields(10 times fields compared >> with scenario0): >> This time ingestion took 29 secs. Only 5.74G heap mem is used. >> >> Not sure why the performance degrades sharply. >> >> If I try to ingest the docs having 100k different fields, it will take 17 >> mins 44 secs. We only have 10k docs totally and not sure why ES perform so >> badly. >> >> Anyone can give suggestion to improve the performance? >> >> >> >> >> >> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d12f9b2c-6d53-4811-8849-d3cb0ba47ae6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.