Hi, Mark:

We are doing single document ingestion. We did a performance comparison 
between Solr and Elastic Search (ES).
The performance for ES degrades dramatically when we increase the metadata 
fields where Solr performance remains the same. 
The performance is done in very small data set (ie. 10k documents, the 
index size is only 75mb). The machine is a high spec machine with 48GB 
memory.
You can see ES performance drop 50% even when the machine have plenty 
memory. ES consumes all the machine memory when metadata field increased to 
100k. 
This behavior seems abnormal since the data is really tiny.

We also tried with larger data set (ie. 100k and 1Mil documents), ES throw 
OOW for scenario 2 for 1 Mil doc scenario. 
We want to know whether this is a bug in ES and/or is there any workaround 
(config step) we can use to eliminate the performance degradation. 
Currently ES performance does not meet the customer requirement so we want 
to see if there is anyway we can bring ES performance to the same level as 
Solr.

Below is the configuration setting and benchmark results for 10k document 
set.
scenario 0 means there are 1000 different metadata fields in the system.
scenario 1 means there are 10k different metatdata fields in the system.
scenario 2 means there are 100k different metadata fields in the system.
scenario 3 means there are 1M different metadata fields in the system.

   - disable hard-commit & soft commit + use a *client* to do commit (ES & 
   Solr) every 10 second
   - ES: flush, refresh are disabled
      - Solr: autoSoftCommit are disabled
   - monitor load on the system (cpu, memory, etc) or the ingestion speed 
   change over time
   - monitor the ingestion speed (is there any degradation over time?)
   - new ES config:new_ES_config.sh 
   
<https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh>;
 
   new ingestion: new_ES_ingest_threads.pl 
   
<https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl>
   - new Solr ingestion: new_Solr_ingest_threads.pl 
   
<https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl>
   - flush interval: 10s


Number of different meta data fieldESSolrScenario 0: 100012secs -> 
833docs/sec
CPU: 30.24%
Heap: 1.08G
time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
index size: 36M
iowait: 0.02%13 secs -> 769 docs/sec
CPU: 28.85%
Heap: 9.39G
time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs -> 
345docs/sec
CPU: 40.83%
Heap: 5.74G
time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
iowait: 0.02%
Index Size: 36M12 secs -> 833 docs/sec
CPU: 28.62%
Heap: 9.88G
time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2Scenario 2: 100k17 mins 44 
secs -> 9.4docs/sec
CPU: 54.73%
Heap: 47.99G
time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
iowait: 0.02%
Index Size: 75M13 secs -> 769 docs/sec
CPU: 29.43%
Heap: 9.84G
time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 
secs -> 0.9 docs/sec
CPU: 40.47%
Heap: 47.99G
time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 159415 
secs -> 666.7 docs/sec
CPU: 45.10%
Heap: 9.64G
time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2

Thanks!
Cindy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to