I run the benchmark where search and ingest runs concurrently. Paste the 
results here:
Number of different meta data field 
ES with disable _all/codec bloom filter 
ES disabled params (Ingestion & Query concurrently) 
Scenario 0: 1000
13 secs ->769 docs/sec
CPU: 23.68%
iowait: 0.01%
Heap: 1.31G
Index Size: 248K
Ingestion speed change: 2 1 1 1 1 1 1 1 2 1
14 secs ->714 docs/sec
CPU: 27.51%
iowait: 0.03%
Heap: 1.27G
Index Size: 304K
Ingestion speed change: 3 1 1 1 1 1 1 2 2 1
Scenario 1: 10k
31 secs -> 322.6 docs/sec
CPU: 39.29%
iowait: 0.01%
Heap: 4.76G
Index Size: 396K
Ingestion speed change: 12 1 2 1 1 1 2 1 4 2

35 secs -> 285docs/sec
CPU: 42.46%
iowait: 0.01%
Heap: 5.14G
Index Size: 336K
Ingestion speed change: 13 2 1 1 2 1 1 4 1 2 

I added one more thread to do the query to the existing ingestion script:
sub query {
  my $qstr = q(curl -s 'http://localhost:9200/doc/type/_search' 
-d'{"query":{"filtered":{"query":{"query_string":{"fields" : [");
  my $fstr = q(curl -s 'http://localhost:9200/doc/type/_search' 
  my $fieldNum =  1000;

  while ($no < $total )
    $tr= int(rand(5));
    if( $tr == 0 )
      $fieldName = "field".int(rand($fieldNum))."_i";
      $fieldValue = "*1*";
    elsif ($tr == 1)
      $fieldName = "field".int(rand($fieldNum))."_dt";
      $fieldValue = "*2*";
      $fieldName = "field".int(rand($fieldNum))."_ss";
      $fieldValue = "f*";

    $cstr = $qstr. "$fieldName" . q("],"query":") . $fieldValue . 
    print $cstr."\n";
    print `$cstr`."\n";

    $tr= int(rand(5));
    if( $tr == 0 )
      $cstr = $fstr. q(range":{ 
"field).int(rand($fieldNum)).q(_i":{"gte":). int(rand(1000)). q(}}}}}}');
    elsif ($tr == 1)
      $cstr = $fstr. q(range":{ "field). 
      $cstr = $fstr. 
print $cstr."\n";
    print `$cstr`."\n";


On Wednesday, June 25, 2014 1:04:08 AM UTC+8, Cindy Hsin wrote:
> Looks like the memory usage increased a lot with 10k fields with these two 
> parameter disabled.
> Based on the experiment we have done, looks like ES have abnormal memory 
> usage and performance degradation when number of fields are large (ie. 
> 10k). Where Solr memory usage and performance remains for the large number 
> fields. 
> If we are only looking at 10k fields scenario, is there a way for ES to 
> make the ingest performance better (perhaps via a bug fix)? Looking at the 
> performance number, I think this abnormal memory usage & performance drop 
> is most likely a bug in ES layer. If this is not technically feasible then 
> we'll report back that we have checked with ES experts and confirmed that 
> there is no way for ES to provide a fix to address this issue. The solution 
> Mike suggestion sounds like a workaround (ie combine multiple fields into 
> one field to reduce the large number of fields). I can run it by our team 
> but not sure if this will fly.
> I have also asked Maco to do one more benchmark (where search and ingest 
> runs concurrently) for both ES and Solr to check whether there is any 
> performance degradation for Solr when search and ingest happens 
> concurrently. I think this is one point that Mike mentioned, right? Even 
> with Solr, you think we will hit some performance issue with large fields 
> when ingest and query runs concurrently.
> Thanks!
> Cindy
> On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:
>> I try to measure the performance of ingesting the documents having lots 
>> of fields.
>> The latest elasticsearch 1.2.1:
>> Total docs count: 10k (a small set definitely)
>> settings:
>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}}
>> mappings:
>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}}
>> All fields in the documents mach the templates in the mappings.
>> Since I disabled the flush & refresh, I submitted the flush command 
>> (along with optimize command after it) in the client program every 10 
>> seconds. (I tried the another interval 10mins and got the similar results)
>> Scenario 0 - 10k docs have 1000 different fields:
>> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
>> heap memory).
>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
>> with scenario0):
>> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>> Not sure why the performance degrades sharply.
>> If I try to ingest the docs having 100k different fields, it will take 17 
>> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
>> badly. 
>> Anyone can give suggestion to improve the performance?

You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to