[jira] [Created] (SOLR-3684) Frequently full gc while do pressure index

Raintung Li (JIRA) Fri, 27 Jul 2012 01:03:39 -0700

Raintung Li created SOLR-3684:
---------------------------------

             Summary: Frequently full gc while do pressure index
                 Key: SOLR-3684
                 URL: https://issues.apache.org/jira/browse/SOLR-3684
             Project: Solr
          Issue Type: Improvement
          Components: multicore
    Affects Versions: 4.0-ALPHA
         Environment: System: Linux
Java process: 4G memory
Jetty: 1000 threads 
Index: 20 field
Core: 5


            Reporter: Raintung Li
            Priority: Critical


Recently we test the Solr index throughput and performance, configure the 20 
fields do test, the field type is normal text_general, start 1000 threads for 
Jetty, and define 5 cores.

After test continued for some time, the solr process throughput is down very 
quickly. After check the root cause, find the java process always do the full 
GC. 
Check the heap dump, the main object is StandardTokenizer, it is be saved in 
the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.

In the Solr, will use the PerFieldReuseStrategy for the default reuse component 
strategy, that means one field has one own StandardTokenizer if it use standard 
analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer 
char array.

The worst case: Total memory = live threads*cores*fields*32KB

In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and 
those object only thread die can be released.

Suggestion:
Every request only handles by one thread that means one document only analyses 
by one thread.  For one thread will parse the document’s field step by step, so 
the same field type can use the same reused component. While thread switches 
the same type’s field analyzes only reset the same component input stream, it 
can save a lot of memory for same type’s field.

Total memory will be = live threads*cores*(different fields types)*32KB

The source code modifies that it is simple; I can provide the modification 
patch for IndexSchema.java: 
private class SolrIndexAnalyzer extends AnalyzerWrapper {
          
        private class SolrFieldReuseStrategy extends ReuseStrategy {

              /**
               * {@inheritDoc}
               */
              @SuppressWarnings("unchecked")
              public TokenStreamComponents getReusableComponents(String 
fieldName) {
                Map<Analyzer, TokenStreamComponents> componentsPerField = 
(Map<Analyzer, TokenStreamComponents>) getStoredValue();
                return componentsPerField != null ? 
componentsPerField.get(analyzers.get(fieldName)) : null;
              }

              /**
               * {@inheritDoc}
               */
              @SuppressWarnings("unchecked")
              public void setReusableComponents(String fieldName, 
TokenStreamComponents components) {
                Map<Analyzer, TokenStreamComponents> componentsPerField = 
(Map<Analyzer, TokenStreamComponents>) getStoredValue();
                if (componentsPerField == null) {
                  componentsPerField = new HashMap<Analyzer, 
TokenStreamComponents>();
                  setStoredValue(componentsPerField);
                }
                componentsPerField.put(analyzers.get(fieldName), components);
              }
        }
        
    protected final static HashMap<String, Analyzer> analyzers;
    /**
     * Implementation of {@link ReuseStrategy} that reuses components per-field 
by
     * maintaining a Map of TokenStreamComponent per field name.
     */
    
    SolrIndexAnalyzer() {
      super(new solrFieldReuseStrategy());
      analyzers = analyzerCache();
    }

    protected HashMap<String, Analyzer> analyzerCache() {
      HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
      for (SchemaField f : getFields().values()) {
        Analyzer analyzer = f.getType().getAnalyzer();
        cache.put(f.getName(), analyzer);
      }
      return cache;
    }

    @Override
    protected Analyzer getWrappedAnalyzer(String fieldName) {
      Analyzer analyzer = analyzers.get(fieldName);
      return analyzer != null ? analyzer : 
getDynamicFieldType(fieldName).getAnalyzer();
    }

    @Override
    protected TokenStreamComponents wrapComponents(String fieldName, 
TokenStreamComponents components) {
      return components;
    }
  }

  private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
    @Override
    protected HashMap<String, Analyzer> analyzerCache() {
      HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
       for (SchemaField f : getFields().values()) {
        Analyzer analyzer = f.getType().getQueryAnalyzer();
        cache.put(f.getName(), analyzer);
      }
      return cache;
    }

    @Override
    protected Analyzer getWrappedAnalyzer(String fieldName) {
      Analyzer analyzer = analyzers.get(fieldName);
      return analyzer != null ? analyzer : 
getDynamicFieldType(fieldName).getQueryAnalyzer();
    }
  }


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-3684) Frequently full gc while do pressure index

Reply via email to