Re: ingest performance degrades sharply along with the documents having more fileds

Michael McCandless Tue, 17 Jun 2014 15:09:40 -0700

I tested roughly your Scenario 2 (100K unique fields, 100 fields per
document) with a straight Lucene test (attached, but not sure if the list
strips attachments).  Net/net I see ~100 docs/sec with one thread ... which
is very slow.


Lucene stores quite a lot for each unique indexed field name and it's
really a bad idea to plan on having so many unique fields in the index:
you'll spend lots of RAM and CPU.

Can you describe the wider use case here?  Maybe there's a more performant
way to achieve it...



On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin <cindy.h...@gmail.com> wrote:

> Hi, Mark:
>
> We are doing single document ingestion. We did a performance comparison
> between Solr and Elastic Search (ES).
> The performance for ES degrades dramatically when we increase the metadata
> fields where Solr performance remains the same.
> The performance is done in very small data set (ie. 10k documents, the
> index size is only 75mb). The machine is a high spec machine with 48GB
> memory.
> You can see ES performance drop 50% even when the machine have plenty
> memory. ES consumes all the machine memory when metadata field increased to
> 100k.
> This behavior seems abnormal since the data is really tiny.
>
> We also tried with larger data set (ie. 100k and 1Mil documents), ES throw
> OOW for scenario 2 for 1 Mil doc scenario.
> We want to know whether this is a bug in ES and/or is there any workaround
> (config step) we can use to eliminate the performance degradation.
> Currently ES performance does not meet the customer requirement so we want
> to see if there is anyway we can bring ES performance to the same level as
> Solr.
>
> Below is the configuration setting and benchmark results for 10k document
> set.
> scenario 0 means there are 1000 different metadata fields in the system.
> scenario 1 means there are 10k different metatdata fields in the system.
> scenario 2 means there are 100k different metadata fields in the system.
> scenario 3 means there are 1M different metadata fields in the system.
>
>    - disable hard-commit & soft commit + use a *client* to do commit (ES
>    & Solr) every 10 second
>    - ES: flush, refresh are disabled
>       - Solr: autoSoftCommit are disabled
>    - monitor load on the system (cpu, memory, etc) or the ingestion speed
>    change over time
>    - monitor the ingestion speed (is there any degradation over time?)
>    - new ES config:new_ES_config.sh
>    
> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh>;
>    new ingestion: new_ES_ingest_threads.pl
>    
> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl>
>    - new Solr ingestion: new_Solr_ingest_threads.pl
>    
> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl>
>    - flush interval: 10s
>
>
> Number of different meta data fieldESSolrScenario 0: 100012secs ->
> 833docs/sec
> CPU: 30.24%
> Heap: 1.08G
> time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
> index size: 36M
> iowait: 0.02%13 secs -> 769 docs/sec
> CPU: 28.85%
> Heap: 9.39G
> time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs ->
> 345docs/sec
> CPU: 40.83%
> Heap: 5.74G
> time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
> iowait: 0.02%
> Index Size: 36M12 secs -> 833 docs/sec
> CPU: 28.62%
> Heap: 9.88G
> time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 Scenario 2: 100k17 mins
> 44 secs -> 9.4docs/sec
> CPU: 54.73%
> Heap: 47.99G
> time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
> iowait: 0.02%
> Index Size: 75M13 secs -> 769 docs/sec
> CPU: 29.43%
> Heap: 9.84G
> time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8
> secs -> 0.9 docs/sec
> CPU: 40.47%
> Heap: 47.99G
> time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594 15
> secs -> 666.7 docs/sec
> CPU: 45.10%
> Heap: 9.64G
> time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2
>
> Thanks!
> Cindy
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcDKZWA8tjsqfcthGUKcEX7q2dohWy_1vcFyKo7JgB53w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ManyLuceneFields.java
Description: Binary data

Re: ingest performance degrades sharply along with the documents having more fileds

Reply via email to