I tested roughly your Scenario 2 (100K unique fields, 100 fields per document) with a straight Lucene test (attached, but not sure if the list strips attachments). Net/net I see ~100 docs/sec with one thread ... which is very slow.
Lucene stores quite a lot for each unique indexed field name and it's really a bad idea to plan on having so many unique fields in the index: you'll spend lots of RAM and CPU. Can you describe the wider use case here? Maybe there's a more performant way to achieve it... On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin <cindy.h...@gmail.com> wrote: > Hi, Mark: > > We are doing single document ingestion. We did a performance comparison > between Solr and Elastic Search (ES). > The performance for ES degrades dramatically when we increase the metadata > fields where Solr performance remains the same. > The performance is done in very small data set (ie. 10k documents, the > index size is only 75mb). The machine is a high spec machine with 48GB > memory. > You can see ES performance drop 50% even when the machine have plenty > memory. ES consumes all the machine memory when metadata field increased to > 100k. > This behavior seems abnormal since the data is really tiny. > > We also tried with larger data set (ie. 100k and 1Mil documents), ES throw > OOW for scenario 2 for 1 Mil doc scenario. > We want to know whether this is a bug in ES and/or is there any workaround > (config step) we can use to eliminate the performance degradation. > Currently ES performance does not meet the customer requirement so we want > to see if there is anyway we can bring ES performance to the same level as > Solr. > > Below is the configuration setting and benchmark results for 10k document > set. > scenario 0 means there are 1000 different metadata fields in the system. > scenario 1 means there are 10k different metatdata fields in the system. > scenario 2 means there are 100k different metadata fields in the system. > scenario 3 means there are 1M different metadata fields in the system. > > - disable hard-commit & soft commit + use a *client* to do commit (ES > & Solr) every 10 second > - ES: flush, refresh are disabled > - Solr: autoSoftCommit are disabled > - monitor load on the system (cpu, memory, etc) or the ingestion speed > change over time > - monitor the ingestion speed (is there any degradation over time?) > - new ES config:new_ES_config.sh > > <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh>; > new ingestion: new_ES_ingest_threads.pl > > <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl> > - new Solr ingestion: new_Solr_ingest_threads.pl > > <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl> > - flush interval: 10s > > > Number of different meta data fieldESSolrScenario 0: 100012secs -> > 833docs/sec > CPU: 30.24% > Heap: 1.08G > time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1 > index size: 36M > iowait: 0.02%13 secs -> 769 docs/sec > CPU: 28.85% > Heap: 9.39G > time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs -> > 345docs/sec > CPU: 40.83% > Heap: 5.74G > time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1 > iowait: 0.02% > Index Size: 36M12 secs -> 833 docs/sec > CPU: 28.62% > Heap: 9.88G > time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 Scenario 2: 100k17 mins > 44 secs -> 9.4docs/sec > CPU: 54.73% > Heap: 47.99G > time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40 > iowait: 0.02% > Index Size: 75M13 secs -> 769 docs/sec > CPU: 29.43% > Heap: 9.84G > time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 > secs -> 0.9 docs/sec > CPU: 40.47% > Heap: 47.99G > time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594 15 > secs -> 666.7 docs/sec > CPU: 45.10% > Heap: 9.64G > time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2 > > Thanks! > Cindy > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcDKZWA8tjsqfcthGUKcEX7q2dohWy_1vcFyKo7JgB53w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
ManyLuceneFields.java
Description: Binary data