[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raintung Li updated SOLR-3684: ------------------------------ Attachment: patch.txt > Frequently full gc while do pressure index > ------------------------------------------ > > Key: SOLR-3684 > URL: https://issues.apache.org/jira/browse/SOLR-3684 > Project: Solr > Issue Type: Improvement > Components: multicore > Affects Versions: 4.0-ALPHA > Environment: System: Linux > Java process: 4G memory > Jetty: 1000 threads > Index: 20 field > Core: 5 > Reporter: Raintung Li > Priority: Critical > Labels: garbage, performance > Fix For: 4.0 > > Attachments: patch.txt > > Original Estimate: 168h > Remaining Estimate: 168h > > Recently we test the Solr index throughput and performance, configure the 20 > fields do test, the field type is normal text_general, start 1000 threads for > Jetty, and define 5 cores. > After test continued for some time, the solr process throughput is down very > quickly. After check the root cause, find the java process always do the full > GC. > Check the heap dump, the main object is StandardTokenizer, it is be saved in > the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. > In the Solr, will use the PerFieldReuseStrategy for the default reuse > component strategy, that means one field has one own StandardTokenizer if it > use standard analyzer, and standardtokenizer will occur 32KB memory because > of zzBuffer char array. > The worst case: Total memory = live threads*cores*fields*32KB > In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, > and those object only thread die can be released. > Suggestion: > Every request only handles by one thread that means one document only > analyses by one thread. For one thread will parse the document’s field step > by step, so the same field type can use the same reused component. While > thread switches the same type’s field analyzes only reset the same component > input stream, it can save a lot of memory for same type’s field. > Total memory will be = live threads*cores*(different fields types)*32KB > The source code modifies that it is simple; I can provide the modification > patch for IndexSchema.java: > private class SolrIndexAnalyzer extends AnalyzerWrapper { > > private class SolrFieldReuseStrategy extends ReuseStrategy { > /** > * {@inheritDoc} > */ > @SuppressWarnings("unchecked") > public TokenStreamComponents getReusableComponents(String > fieldName) { > Map<Analyzer, TokenStreamComponents> componentsPerField = > (Map<Analyzer, TokenStreamComponents>) getStoredValue(); > return componentsPerField != null ? > componentsPerField.get(analyzers.get(fieldName)) : null; > } > /** > * {@inheritDoc} > */ > @SuppressWarnings("unchecked") > public void setReusableComponents(String fieldName, > TokenStreamComponents components) { > Map<Analyzer, TokenStreamComponents> componentsPerField = > (Map<Analyzer, TokenStreamComponents>) getStoredValue(); > if (componentsPerField == null) { > componentsPerField = new HashMap<Analyzer, > TokenStreamComponents>(); > setStoredValue(componentsPerField); > } > componentsPerField.put(analyzers.get(fieldName), components); > } > } > > protected final static HashMap<String, Analyzer> analyzers; > /** > * Implementation of {@link ReuseStrategy} that reuses components > per-field by > * maintaining a Map of TokenStreamComponent per field name. > */ > > SolrIndexAnalyzer() { > super(new solrFieldReuseStrategy()); > analyzers = analyzerCache(); > } > protected HashMap<String, Analyzer> analyzerCache() { > HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>(); > for (SchemaField f : getFields().values()) { > Analyzer analyzer = f.getType().getAnalyzer(); > cache.put(f.getName(), analyzer); > } > return cache; > } > @Override > protected Analyzer getWrappedAnalyzer(String fieldName) { > Analyzer analyzer = analyzers.get(fieldName); > return analyzer != null ? analyzer : > getDynamicFieldType(fieldName).getAnalyzer(); > } > @Override > protected TokenStreamComponents wrapComponents(String fieldName, > TokenStreamComponents components) { > return components; > } > } > private class SolrQueryAnalyzer extends SolrIndexAnalyzer { > @Override > protected HashMap<String, Analyzer> analyzerCache() { > HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>(); > for (SchemaField f : getFields().values()) { > Analyzer analyzer = f.getType().getQueryAnalyzer(); > cache.put(f.getName(), analyzer); > } > return cache; > } > @Override > protected Analyzer getWrappedAnalyzer(String fieldName) { > Analyzer analyzer = analyzers.get(fieldName); > return analyzer != null ? analyzer : > getDynamicFieldType(fieldName).getQueryAnalyzer(); > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org