[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Eks Dev (JIRA) Tue, 07 Aug 2012 00:08:12 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429985#comment-13429985
 ]


Eks Dev commented on SOLR-3684:
-------------------------------

We did it a long time ago on tomcat, as we use particularly expensive 
analyzers, so even for searching optimum is around Noo cores. Actually, that 
was the only big problem with solr we had.  
 
Actually, anything that keeps insane thread churn low helps. Not only max 
number of threads, but TTL time for idle threads should be also somehow 
increased. The longer threads live, the better. Solr is completely safe due to 
core-reloading and smart Index management, no point in renewing threads.   

If one needs to queue requests, that is just another problem,  but for this 
there no need to up max worker threads to more than number of cores plus some 
smallish constant

What we would like to achieve is to keep separate thread pools for searching, 
indexing and "the rest"... but we never managed to figure out how to do it. 
even benign, /ping, /status.... whatever are increasing thread churn... If we 
were able to configure separate pools , we could keep small number of 
long-living threads for searching, even smaller number for indexing and one 
"who cares" pool for the rest. It is somehow possible on tomcat, if someone 
knows how to do it, please share. 
                
> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads 
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 
> fields do test, the field type is normal text_general, start 1000 threads for 
> Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very 
> quickly. After check the root cause, find the java process always do the full 
> GC. 
> Check the heap dump, the main object is StandardTokenizer, it is be saved in 
> the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse 
> component strategy, that means one field has one own StandardTokenizer if it 
> use standard analyzer,  and standardtokenizer will occur 32KB memory because 
> of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
> and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only 
> analyses by one thread.  For one thread will parse the document’s field step 
> by step, so the same field type can use the same reused component. While 
> thread switches the same type’s field analyzes only reset the same component 
> input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification 
> patch for IndexSchema.java: 
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>         
>       private class SolrFieldReuseStrategy extends ReuseStrategy {
>             /**
>              * {@inheritDoc}
>              */
>             @SuppressWarnings("unchecked")
>             public TokenStreamComponents getReusableComponents(String 
> fieldName) {
>               Map<Analyzer, TokenStreamComponents> componentsPerField = 
> (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>               return componentsPerField != null ? 
> componentsPerField.get(analyzers.get(fieldName)) : null;
>             }
>             /**
>              * {@inheritDoc}
>              */
>             @SuppressWarnings("unchecked")
>             public void setReusableComponents(String fieldName, 
> TokenStreamComponents components) {
>               Map<Analyzer, TokenStreamComponents> componentsPerField = 
> (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>               if (componentsPerField == null) {
>                 componentsPerField = new HashMap<Analyzer, 
> TokenStreamComponents>();
>                 setStoredValue(componentsPerField);
>               }
>               componentsPerField.put(analyzers.get(fieldName), components);
>             }
>       }
>       
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components 
> per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>     
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : 
> getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, 
> TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : 
> getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Reply via email to