Hi Kimchy,

I rerun the benchmark using ES1.3 with default settings (just disable the 
_source & _all ) and it makes a great progress on the performance. However 
Solr still outperforms ES 1.3:
Number of different meta data field 
ES 
ES with disable _all/codec bloom filter 

*ES 1.3 *
Solr 

Scenario 0: 1000
12secs -> *833*docs/sec
CPU: 30.24%
Heap: 1.08G
time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
*index size: 36Mb*
iowait: 0.02%
13 secs ->769 docs/sec
CPU: 23.68%
iowait: 0.01%
Heap: 1.31G
Index Size: 248K
Ingestion speed change: 2 1 1 1 1 1 1 1 2 1

13 secs->769 docs/sec
CPU: 44.22%
iowait: 0.01%
Heap: 1.38G
Index Size: 69M
Ingestion speed change: 2 1 1 1 1 1 2 0 2 2

13 secs -> 769 docs/sec
CPU: 28.85%
Heap: 9.39G
time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2

Scenario 1: 10k
29secs -> *345*docs/sec
CPU: 40.83%
Heap: 5.74G
time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
iowait: 0.02%
*Index Size: 36Mb*
31 secs -> 322.6 docs/sec
CPU: 39.29%
iowait: 0.01%
Heap: 4.76G
Index Size: 396K
Ingestion speed change: 12 1 2 1 1 1 2 1 4 2

20 secs->500 docs/sec
CPU: 54.74%
iowait: 0.02%
Heap: 3.06G
Index Size: 133M
Ingestion speed change: 2 2 1 2 2 3 2 2 2 1
12 secs -> 833 docs/sec
CPU: 28.62%
Heap: 9.88G
time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2

Scenario 2: 100k
17 mins 44 secs -> *9.4*docs/sec
CPU: 54.73%
Heap: 47.99G
time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
iowait: 0.02%
*Index Size: 75Mb*
14 mins 24 secs -> 11.6 docs/sec
CPU: 52.30%
iowait: 0.02%
Heap:
Index Size: 1.5M
Ingestion speed change: 93 153 151 112 84 65 61 53 51 41

1 mins 24 secs-> 119 docs/sec
CPU: 47.67%
iowait: 0.12%
Heap: 8.66G
Index Size: 163M
Ingestion speed change: 9 14 12 12 8 8 5 7 5 4
13 secs -> 769 docs/sec
CPU: 29.43%
Heap: 9.84G
time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2

Scenario 3: 1M
183 mins 8 secs -> *0.9* docs/sec
CPU: 40.47%
Heap: 47.99G
time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594


11 mins 9 secs->15docs/sec
CPU: 41.45%
iowait: 0.07%
Heap: 36.12G
Index Size: 163M
Ingestion speed change: 12 24 38 55 70 86 106 117 83 78
15 secs -> 666.7 docs/sec
CPU: 45.10%
Heap: 9.64G
time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2

 

Best Regards
Maco

On Saturday, July 5, 2014 11:46:59 PM UTC+8, kimchy wrote:
>
> Heya, I worked a bit on it, and 1.x (upcoming 1.3) has some significant 
> perf improvements now for this case (including improvements Lucene wise, 
> that are for now in ES, but will be in Lucene next version). Those include:
>
> 6648: https://github.com/elasticsearch/elasticsearch/pull/6648
> 6714: https://github.com/elasticsearch/elasticsearch/pull/6714
> 6707: https://github.com/elasticsearch/elasticsearch/pull/6707
>
> It would be interesting if you can run the tests again with 1.x branch. 
> Note, also, please use default features in ES for now, no disable flushing 
> and such.
>
> On Friday, June 13, 2014 7:57:23 AM UTC+2, Maco Ma wrote:
>>
>> I try to measure the performance of ingesting the documents having lots 
>> of fields.
>>
>>
>> The latest elasticsearch 1.2.1:
>> Total docs count: 10k (a small set definitely)
>> ES_HEAP_SIZE: 48G
>> settings:
>>
>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}}
>>
>> mappings:
>>
>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}}
>>
>> All fields in the documents mach the templates in the mappings.
>>
>> Since I disabled the flush & refresh, I submitted the flush command 
>> (along with optimize command after it) in the client program every 10 
>> seconds. (I tried the another interval 10mins and got the similar results)
>>
>> Scenario 0 - 10k docs have 1000 different fields:
>> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
>> heap memory).
>>
>>
>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
>> with scenario0):
>> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>>
>> Not sure why the performance degrades sharply.
>>
>> If I try to ingest the docs having 100k different fields, it will take 17 
>> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
>> badly. 
>>
>> Anyone can give suggestion to improve the performance?
>>
>>
>>
>>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3a2572a6-c97d-47f5-a801-b1d933c22990%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to