Added the Solr benchmark as well: Number of different meta data field
ES with disable _all/codec bloom filter ES (Ingestion & Query concurrently) Solr Solr(Ingestion & Query concurrently) Scenario 0: 1000 13 secs ->769 docs/sec CPU: 23.68% iowait: 0.01% Heap: 1.31G Index Size: 248K Ingestion speed change: 2 1 1 1 1 1 1 1 2 1 14 secs ->714 docs/sec CPU: 27.51% iowait: 0.03% Heap: 1.27G Index Size: 304K Ingestion speed change: 3 1 1 1 1 1 1 2 2 1 13 secs -> 769 docs/sec CPU: 28.85% Heap: 9.39G time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2 14 secs->714 docs/sec CPU: 37.02% Heap: 10G Ingestion speed change: 2 2 1 1 1 1 2 2 1 1 Scenario 1: 10k 31 secs -> 322.6 docs/sec CPU: 39.29% iowait: 0.01% Heap: 4.76G Index Size: 396K Ingestion speed change: 12 1 2 1 1 1 2 1 4 2 35 secs -> 285docs/sec CPU: 42.46% iowait: 0.01% Heap: 5.14G Index Size: 336K Ingestion speed change: 13 2 1 1 2 1 1 4 1 2 12 secs -> 833 docs/sec CPU: 28.62% Heap: 9.88G time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 16 secs-> 625 docs/sec CPU: 34.07% Heap: 10G Ingestion speed change: 2 2 1 1 1 1 2 2 2 2 List several sample queries for Solr: curl -s 'http://localhost:8983/solr/collection2/query?rows=0&q=field282_ss:f*' curl -s 'http://localhost:8983/solr/collection2/query?rows=0&q=field989_dt:\[2012-3-06T01%3A15%3A51Z%20TO%20NOW\]' curl -s 'http://localhost:8983/solr/collection2/query?rows=0&q=field363_i:\[0%20TO%20177\]' filters: curl -s 'http://localhost:8983/solr/collection2/query?rows=0&q=*&fq=field118_i:\[0%20TO%2029\]' curl -s 'http://localhost:8983/solr/collection2/query?rows=0&q=*&fq=field91_dt:\[2012-1-06T01%3A15%3A51Z%20TO%20NOW\]' curl -s 'http://localhost:8983/solr/collection2/query?rows=0&q=*&fq=field879_ss:f*' Maco On Wednesday, June 25, 2014 5:23:16 PM UTC+8, Maco Ma wrote: > > I run the benchmark where search and ingest runs concurrently. Paste the > results here: > Number of different meta data field > ES with disable _all/codec bloom filter > ES disabled params (Ingestion & Query concurrently) > Scenario 0: 1000 > 13 secs ->769 docs/sec > CPU: 23.68% > iowait: 0.01% > Heap: 1.31G > Index Size: 248K > Ingestion speed change: 2 1 1 1 1 1 1 1 2 1 > 14 secs ->714 docs/sec > CPU: 27.51% > iowait: 0.03% > Heap: 1.27G > Index Size: 304K > Ingestion speed change: 3 1 1 1 1 1 1 2 2 1 > Scenario 1: 10k > 31 secs -> 322.6 docs/sec > CPU: 39.29% > iowait: 0.01% > Heap: 4.76G > Index Size: 396K > Ingestion speed change: 12 1 2 1 1 1 2 1 4 2 > > 35 secs -> 285docs/sec > CPU: 42.46% > iowait: 0.01% > Heap: 5.14G > Index Size: 336K > Ingestion speed change: 13 2 1 1 2 1 1 4 1 2 > > > I added one more thread to do the query to the existing ingestion script: > sub query { > my $qstr = q(curl -s 'http://localhost:9200/doc/type/_search' > -d'{"query":{"filtered":{"query":{"query_string":{"fields" : ["); > my $fstr = q(curl -s 'http://localhost:9200/doc/type/_search' > -d'{"query":{"filtered":{"query":{"match_all":{}},"filter":{"); > my $fieldNum = 1000; > > while ($no < $total ) > { > $tr= int(rand(5)); > if( $tr == 0 ) > { > $fieldName = "field".int(rand($fieldNum))."_i"; > $fieldValue = "*1*"; > } > elsif ($tr == 1) > { > $fieldName = "field".int(rand($fieldNum))."_dt"; > $fieldValue = "*2*"; > } > else > { > $fieldName = "field".int(rand($fieldNum))."_ss"; > $fieldValue = "f*"; > } > > $cstr = $qstr. "$fieldName" . q("],"query":") . $fieldValue . > q("}}}}}'); > print $cstr."\n"; > print `$cstr`."\n"; > > $tr= int(rand(5)); > if( $tr == 0 ) > { > $cstr = $fstr. q(range":{ > "field).int(rand($fieldNum)).q(_i":{"gte":). int(rand(1000)). q(}}}}}}'); > } > elsif ($tr == 1) > { > $cstr = $fstr. q(range":{ "field). > int(rand($fieldNum)).q(_dt":{"from": > "2010-01-).(1+int(rand(31))).q(T02:10:03"}}}}}}'); > } > else > { > $cstr = $fstr. > q(regexp":{"field).int(rand($fieldNum)).q(_ss":"f.*"}}}}}'); > } > print $cstr."\n"; > print `$cstr`."\n"; > } > } > > > Maco > > On Wednesday, June 25, 2014 1:04:08 AM UTC+8, Cindy Hsin wrote: >> >> Looks like the memory usage increased a lot with 10k fields with these >> two parameter disabled. >> >> Based on the experiment we have done, looks like ES have abnormal memory >> usage and performance degradation when number of fields are large (ie. >> 10k). Where Solr memory usage and performance remains for the large number >> fields. >> >> If we are only looking at 10k fields scenario, is there a way for ES to >> make the ingest performance better (perhaps via a bug fix)? Looking at the >> performance number, I think this abnormal memory usage & performance drop >> is most likely a bug in ES layer. If this is not technically feasible then >> we'll report back that we have checked with ES experts and confirmed that >> there is no way for ES to provide a fix to address this issue. The solution >> Mike suggestion sounds like a workaround (ie combine multiple fields into >> one field to reduce the large number of fields). I can run it by our team >> but not sure if this will fly. >> >> I have also asked Maco to do one more benchmark (where search and ingest >> runs concurrently) for both ES and Solr to check whether there is any >> performance degradation for Solr when search and ingest happens >> concurrently. I think this is one point that Mike mentioned, right? Even >> with Solr, you think we will hit some performance issue with large fields >> when ingest and query runs concurrently. >> >> Thanks! >> Cindy >> >> On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote: >>> >>> I try to measure the performance of ingesting the documents having lots >>> of fields. >>> >>> >>> The latest elasticsearch 1.2.1: >>> Total docs count: 10k (a small set definitely) >>> ES_HEAP_SIZE: 48G >>> settings: >>> >>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}} >>> >>> mappings: >>> >>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}} >>> >>> All fields in the documents mach the templates in the mappings. >>> >>> Since I disabled the flush & refresh, I submitted the flush command >>> (along with optimize command after it) in the client program every 10 >>> seconds. (I tried the another interval 10mins and got the similar results) >>> >>> Scenario 0 - 10k docs have 1000 different fields: >>> Ingestion took 12 secs. Only 1.08G heap mem is used(only states the >>> used heap memory). >>> >>> >>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared >>> with scenario0): >>> This time ingestion took 29 secs. Only 5.74G heap mem is used. >>> >>> Not sure why the performance degrades sharply. >>> >>> If I try to ingest the docs having 100k different fields, it will take >>> 17 mins 44 secs. We only have 10k docs totally and not sure why ES perform >>> so badly. >>> >>> Anyone can give suggestion to improve the performance? >>> >>> >>> >>> >>> >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/765afa4b-5b9a-414d-91f5-e1c6f234a9a9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.