[
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429287#comment-13429287
]
Robert Muir commented on SOLR-3684:
-----------------------------------
FYI: I lowered the jflex buffer sizes from 32kb to 8kb in LUCENE-4291.
So I think we should still:
# Address this default jetty threadpool size of max=10,000. This is the real
issue.
# See if we can deal with the crazy corner case so we can impl your patch
(reuse by fieldtype), which I think is a good separate improvement.
> Frequently full gc while do pressure index
> ------------------------------------------
>
> Key: SOLR-3684
> URL: https://issues.apache.org/jira/browse/SOLR-3684
> Project: Solr
> Issue Type: Improvement
> Components: multicore
> Affects Versions: 4.0-ALPHA
> Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
> Reporter: Raintung Li
> Priority: Critical
> Labels: garbage, performance
> Fix For: 4.0
>
> Attachments: patch.txt
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20
> fields do test, the field type is normal text_general, start 1000 threads for
> Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very
> quickly. After check the root cause, find the java process always do the full
> GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in
> the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse
> component strategy, that means one field has one own StandardTokenizer if it
> use standard analyzer, and standardtokenizer will occur 32KB memory because
> of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer,
> and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only
> analyses by one thread. For one thread will parse the document’s field step
> by step, so the same field type can use the same reused component. While
> thread switches the same type’s field analyzes only reset the same component
> input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification
> patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>
> private class SolrFieldReuseStrategy extends ReuseStrategy {
> /**
> * {@inheritDoc}
> */
> @SuppressWarnings("unchecked")
> public TokenStreamComponents getReusableComponents(String
> fieldName) {
> Map<Analyzer, TokenStreamComponents> componentsPerField =
> (Map<Analyzer, TokenStreamComponents>) getStoredValue();
> return componentsPerField != null ?
> componentsPerField.get(analyzers.get(fieldName)) : null;
> }
> /**
> * {@inheritDoc}
> */
> @SuppressWarnings("unchecked")
> public void setReusableComponents(String fieldName,
> TokenStreamComponents components) {
> Map<Analyzer, TokenStreamComponents> componentsPerField =
> (Map<Analyzer, TokenStreamComponents>) getStoredValue();
> if (componentsPerField == null) {
> componentsPerField = new HashMap<Analyzer,
> TokenStreamComponents>();
> setStoredValue(componentsPerField);
> }
> componentsPerField.put(analyzers.get(fieldName), components);
> }
> }
>
> protected final static HashMap<String, Analyzer> analyzers;
> /**
> * Implementation of {@link ReuseStrategy} that reuses components
> per-field by
> * maintaining a Map of TokenStreamComponent per field name.
> */
>
> SolrIndexAnalyzer() {
> super(new solrFieldReuseStrategy());
> analyzers = analyzerCache();
> }
> protected HashMap<String, Analyzer> analyzerCache() {
> HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
> for (SchemaField f : getFields().values()) {
> Analyzer analyzer = f.getType().getAnalyzer();
> cache.put(f.getName(), analyzer);
> }
> return cache;
> }
> @Override
> protected Analyzer getWrappedAnalyzer(String fieldName) {
> Analyzer analyzer = analyzers.get(fieldName);
> return analyzer != null ? analyzer :
> getDynamicFieldType(fieldName).getAnalyzer();
> }
> @Override
> protected TokenStreamComponents wrapComponents(String fieldName,
> TokenStreamComponents components) {
> return components;
> }
> }
> private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
> @Override
> protected HashMap<String, Analyzer> analyzerCache() {
> HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
> for (SchemaField f : getFields().values()) {
> Analyzer analyzer = f.getType().getQueryAnalyzer();
> cache.put(f.getName(), analyzer);
> }
> return cache;
> }
> @Override
> protected Analyzer getWrappedAnalyzer(String fieldName) {
> Analyzer analyzer = analyzers.get(fieldName);
> return analyzer != null ? analyzer :
> getDynamicFieldType(fieldName).getQueryAnalyzer();
> }
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]