Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-24 Thread Cindy Hsin
Looks like the memory usage increased a lot with 10k fields with these two 
parameter disabled.

Based on the experiment we have done, looks like ES have abnormal memory 
usage and performance degradation when number of fields are large (ie. 
10k). Where Solr memory usage and performance remains for the large number 
fields. 

If we are only looking at 10k fields scenario, is there a way for ES to 
make the ingest performance better (perhaps via a bug fix)? Looking at the 
performance number, I think this abnormal memory usage  performance drop 
is most likely a bug in ES layer. If this is not technically feasible then 
we'll report back that we have checked with ES experts and confirmed that 
there is no way for ES to provide a fix to address this issue. The solution 
Mike suggestion sounds like a workaround (ie combine multiple fields into 
one field to reduce the large number of fields). I can run it by our team 
but not sure if this will fly.

I have also asked Maco to do one more benchmark (where search and ingest 
runs concurrently) for both ES and Solr to check whether there is any 
performance degradation for Solr when search and ingest happens 
concurrently. I think this is one point that Mike mentioned, right? Even 
with Solr, you think we will hit some performance issue with large fields 
when ingest and query runs concurrently.

Thanks!
Cindy

On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:

 I try to measure the performance of ingesting the documents having lots of 
 fields.


 The latest elasticsearch 1.2.1:
 Total docs count: 10k (a small set definitely)
 ES_HEAP_SIZE: 48G
 settings:

 {doc:{settings:{index:{uuid:LiWHzE5uQrinYW1wW4E3nA,number_of_replicas:0,translog:{disable_flush:true},number_of_shards:5,refresh_interval:-1,version:{created:1020199}

 mappings:

 {doc:{mappings:{type:{dynamic_templates:[{t1:{mapping:{store:false,norms:{enabled:false},type:string},match:*_ss}},{t2:{mapping:{store:false,type:date},match:*_dt}},{t3:{mapping:{store:false,type:integer},match:*_i}}],_source:{enabled:false},properties:{}

 All fields in the documents mach the templates in the mappings.

 Since I disabled the flush  refresh, I submitted the flush command (along 
 with optimize command after it) in the client program every 10 seconds. (I 
 tried the another interval 10mins and got the similar results)

 Scenario 0 - 10k docs have 1000 different fields:
 Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
 heap memory).


 Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
 with scenario0):
 This time ingestion took 29 secs.   Only 5.74G heap mem is used.

 Not sure why the performance degrades sharply.

 If I try to ingest the docs having 100k different fields, it will take 17 
 mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
 badly. 

 Anyone can give suggestion to improve the performance?









-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/06d319c4-ee7a-40e3-b11a-6e0adff2c686%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-23 Thread Cindy Hsin
Thanks!

I have asked Maco to re-test ES with these two parameter disabled.

One more question regard Lucene's capability with large amount of metadata 
fields. What is the largest meta data fileds Lucene supports per Index?
What are different strategy to solve the large metadata fields issue? Do 
you recommend to use type to partition different set of meta data fields 
within a index?
I will clarify with our team regard their usage for large meta data fields 
as well.

Thanks!
Cindy

On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:

 I try to measure the performance of ingesting the documents having lots of 
 fields.


 The latest elasticsearch 1.2.1:
 Total docs count: 10k (a small set definitely)
 ES_HEAP_SIZE: 48G
 settings:

 {doc:{settings:{index:{uuid:LiWHzE5uQrinYW1wW4E3nA,number_of_replicas:0,translog:{disable_flush:true},number_of_shards:5,refresh_interval:-1,version:{created:1020199}

 mappings:

 {doc:{mappings:{type:{dynamic_templates:[{t1:{mapping:{store:false,norms:{enabled:false},type:string},match:*_ss}},{t2:{mapping:{store:false,type:date},match:*_dt}},{t3:{mapping:{store:false,type:integer},match:*_i}}],_source:{enabled:false},properties:{}

 All fields in the documents mach the templates in the mappings.

 Since I disabled the flush  refresh, I submitted the flush command (along 
 with optimize command after it) in the client program every 10 seconds. (I 
 tried the another interval 10mins and got the similar results)

 Scenario 0 - 10k docs have 1000 different fields:
 Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
 heap memory).


 Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
 with scenario0):
 This time ingestion took 29 secs.   Only 5.74G heap mem is used.

 Not sure why the performance degrades sharply.

 If I try to ingest the docs having 100k different fields, it will take 17 
 mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
 badly. 

 Anyone can give suggestion to improve the performance?









-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8c5874cd-a1ff-432b-9bdf-e8a54a505fcb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Cindy Hsin
The way we make Solr ingest faster (single document ingest) is by turn off 
the engine soft commit and hard commit and use a client to commit the 
changes every 10 seconds. 

Solr ingest speed remains at 800 docs per second where ES ingest speed 
drops in half when we increase the fields (ie. from 1000 to 10k).
I have asked Maco to send you the requested script so you can do more 
analysis.

If you can help to solve the first level ES performance degradation (ie. 
1000 to 10k) as a starting point, that will be the best.

We do have real customer scenario that require large amount of metadata 
fields, that is why this is a blocking issue for the stack evaluation 
between Solr and Elastic Search.

Thanks!
Cindy

On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:

 I try to measure the performance of ingesting the documents having lots of 
 fields.


 The latest elasticsearch 1.2.1:
 Total docs count: 10k (a small set definitely)
 ES_HEAP_SIZE: 48G
 settings:

 {doc:{settings:{index:{uuid:LiWHzE5uQrinYW1wW4E3nA,number_of_replicas:0,translog:{disable_flush:true},number_of_shards:5,refresh_interval:-1,version:{created:1020199}

 mappings:

 {doc:{mappings:{type:{dynamic_templates:[{t1:{mapping:{store:false,norms:{enabled:false},type:string},match:*_ss}},{t2:{mapping:{store:false,type:date},match:*_dt}},{t3:{mapping:{store:false,type:integer},match:*_i}}],_source:{enabled:false},properties:{}

 All fields in the documents mach the templates in the mappings.

 Since I disabled the flush  refresh, I submitted the flush command (along 
 with optimize command after it) in the client program every 10 seconds. (I 
 tried the another interval 10mins and got the similar results)

 Scenario 0 - 10k docs have 1000 different fields:
 Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
 heap memory).


 Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
 with scenario0):
 This time ingestion took 29 secs.   Only 5.74G heap mem is used.

 Not sure why the performance degrades sharply.

 If I try to ingest the docs having 100k different fields, it will take 17 
 mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
 badly. 

 Anyone can give suggestion to improve the performance?









-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79911a7f-4118-4421-bc2d-2284eccebd3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-13 Thread Cindy Hsin
Hi, Mark:

We are doing single document ingestion. We did a performance comparison 
between Solr and Elastic Search (ES).
The performance for ES degrades dramatically when we increase the metadata 
fields where Solr performance remains the same. 
The performance is done in very small data set (ie. 10k documents, the 
index size is only 75mb). The machine is a high spec machine with 48GB 
memory.
You can see ES performance drop 50% even when the machine have plenty 
memory. ES consumes all the machine memory when metadata field increased to 
100k. 
This behavior seems abnormal since the data is really tiny.

We also tried with larger data set (ie. 100k and 1Mil documents), ES throw 
OOW for scenario 2 for 1 Mil doc scenario. 
We want to know whether this is a bug in ES and/or is there any workaround 
(config step) we can use to eliminate the performance degradation. 
Currently ES performance does not meet the customer requirement so we want 
to see if there is anyway we can bring ES performance to the same level as 
Solr.

Below is the configuration setting and benchmark results for 10k document 
set.
scenario 0 means there are 1000 different metadata fields in the system.
scenario 1 means there are 10k different metatdata fields in the system.
scenario 2 means there are 100k different metadata fields in the system.
scenario 3 means there are 1M different metadata fields in the system.

   - disable hard-commit  soft commit + use a *client* to do commit (ES  
   Solr) every 10 second
   - ES: flush, refresh are disabled
  - Solr: autoSoftCommit are disabled
   - monitor load on the system (cpu, memory, etc) or the ingestion speed 
   change over time
   - monitor the ingestion speed (is there any degradation over time?)
   - new ES config:new_ES_config.sh 
   
https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh;
 
   new ingestion: new_ES_ingest_threads.pl 
   
https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl
   - new Solr ingestion: new_Solr_ingest_threads.pl 
   
https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl
   - flush interval: 10s


Number of different meta data fieldESSolrScenario 0: 100012secs - 
833docs/sec
CPU: 30.24%
Heap: 1.08G
time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
index size: 36M
iowait: 0.02%13 secs - 769 docs/sec
CPU: 28.85%
Heap: 9.39G
time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs - 
345docs/sec
CPU: 40.83%
Heap: 5.74G
time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
iowait: 0.02%
Index Size: 36M12 secs - 833 docs/sec
CPU: 28.62%
Heap: 9.88G
time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2Scenario 2: 100k17 mins 44 
secs - 9.4docs/sec
CPU: 54.73%
Heap: 47.99G
time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
iowait: 0.02%
Index Size: 75M13 secs - 769 docs/sec
CPU: 29.43%
Heap: 9.84G
time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 
secs - 0.9 docs/sec
CPU: 40.47%
Heap: 47.99G
time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 159415 
secs - 666.7 docs/sec
CPU: 45.10%
Heap: 9.64G
time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2

Thanks!
Cindy

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.