I run the benchmark where search and ingest runs concurrently. Paste the 
results here:
Number of different meta data field 
ES with disable _all/codec bloom filter 
ES disabled params (Ingestion & Query concurrently) 
Scenario 0: 1000
13 secs ->769 docs/sec
CPU: 23.68%
iowait: 0.01%
Heap: 1.31G
Index Size: 248K
Ingestion speed change: 2 1 1 1 1 1 1 1 2 1
14 secs ->714 docs/sec
CPU: 27.51%
iowait: 0.03%
Heap: 1.27G
Index Size: 304K
Ingestion speed change: 3 1 1 1 1 1 1 2 2 1
Scenario 1: 10k
31 secs -> 322.6 docs/sec
CPU: 39.29%
iowait: 0.01%
Heap: 4.76G
Index Size: 396K
Ingestion speed change: 12 1 2 1 1 1 2 1 4 2

35 secs -> 285docs/sec
CPU: 42.46%
iowait: 0.01%
Heap: 5.14G
Index Size: 336K
Ingestion speed change: 13 2 1 1 2 1 1 4 1 2 


I added one more thread to do the query to the existing ingestion script:
sub query {
  my $qstr = q(curl -s 'http://localhost:9200/doc/type/_search' 
-d'{"query":{"filtered":{"query":{"query_string":{"fields" : [");
  my $fstr = q(curl -s 'http://localhost:9200/doc/type/_search' 
-d'{"query":{"filtered":{"query":{"match_all":{}},"filter":{");
  my $fieldNum =  1000;

  while ($no < $total )
  {
    $tr= int(rand(5));
    if( $tr == 0 )
    {
      $fieldName = "field".int(rand($fieldNum))."_i";
      $fieldValue = "*1*";
    }
    elsif ($tr == 1)
    {
      $fieldName = "field".int(rand($fieldNum))."_dt";
      $fieldValue = "*2*";
    }
    else
    {
      $fieldName = "field".int(rand($fieldNum))."_ss";
      $fieldValue = "f*";
    }

    $cstr = $qstr. "$fieldName" . q("],"query":") . $fieldValue . 
q("}}}}}');
    print $cstr."\n";
    print `$cstr`."\n";

    $tr= int(rand(5));
    if( $tr == 0 )
    {
      $cstr = $fstr. q(range":{ 
"field).int(rand($fieldNum)).q(_i":{"gte":). int(rand(1000)). q(}}}}}}');
    }
    elsif ($tr == 1)
    {
      $cstr = $fstr. q(range":{ "field). 
int(rand($fieldNum)).q(_dt":{"from": 
"2010-01-).(1+int(rand(31))).q(T02:10:03"}}}}}}');
    }
    else
    {
      $cstr = $fstr. 
q(regexp":{"field).int(rand($fieldNum)).q(_ss":"f.*"}}}}}');
    }
print $cstr."\n";
    print `$cstr`."\n";
  }
}


Maco

On Wednesday, June 25, 2014 1:04:08 AM UTC+8, Cindy Hsin wrote:
>
> Looks like the memory usage increased a lot with 10k fields with these two 
> parameter disabled.
>
> Based on the experiment we have done, looks like ES have abnormal memory 
> usage and performance degradation when number of fields are large (ie. 
> 10k). Where Solr memory usage and performance remains for the large number 
> fields. 
>
> If we are only looking at 10k fields scenario, is there a way for ES to 
> make the ingest performance better (perhaps via a bug fix)? Looking at the 
> performance number, I think this abnormal memory usage & performance drop 
> is most likely a bug in ES layer. If this is not technically feasible then 
> we'll report back that we have checked with ES experts and confirmed that 
> there is no way for ES to provide a fix to address this issue. The solution 
> Mike suggestion sounds like a workaround (ie combine multiple fields into 
> one field to reduce the large number of fields). I can run it by our team 
> but not sure if this will fly.
>
> I have also asked Maco to do one more benchmark (where search and ingest 
> runs concurrently) for both ES and Solr to check whether there is any 
> performance degradation for Solr when search and ingest happens 
> concurrently. I think this is one point that Mike mentioned, right? Even 
> with Solr, you think we will hit some performance issue with large fields 
> when ingest and query runs concurrently.
>
> Thanks!
> Cindy
>
> On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:
>>
>> I try to measure the performance of ingesting the documents having lots 
>> of fields.
>>
>>
>> The latest elasticsearch 1.2.1:
>> Total docs count: 10k (a small set definitely)
>> ES_HEAP_SIZE: 48G
>> settings:
>>
>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}}
>>
>> mappings:
>>
>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}}
>>
>> All fields in the documents mach the templates in the mappings.
>>
>> Since I disabled the flush & refresh, I submitted the flush command 
>> (along with optimize command after it) in the client program every 10 
>> seconds. (I tried the another interval 10mins and got the similar results)
>>
>> Scenario 0 - 10k docs have 1000 different fields:
>> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
>> heap memory).
>>
>>
>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
>> with scenario0):
>> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>>
>> Not sure why the performance degrades sharply.
>>
>> If I try to ingest the docs having 100k different fields, it will take 17 
>> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
>> badly. 
>>
>> Anyone can give suggestion to improve the performance?
>>
>>
>>
>>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d12f9b2c-6d53-4811-8849-d3cb0ba47ae6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to