[ https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476584#comment-13476584 ]
Hoss Man commented on SOLR-3881: -------------------------------- bq. One possible solution is to limit the size of the string that is selected for concatenation. I don't know if there is anyway to make LanguageIdentifierUpdateProcessor more memory efficient (in particular, i'm not sure why it needs to concat the field values instead of operating on them directly) but if you want to give langId just the first N characters of another field: that should already be possible w/o cod changes by wiring together the CloneFieldUpdateProcessorFactory with the TruncateFieldUpdateProcessorFactory. Something like this should work... {code} ... <processor class="solr.CloneFieldUpdateProcessorFactory"> <str name="source">GIANT_HONKING_STRING_FIELD</str> <str name="dest">truncated_string_field_for_lang_detect</str> </processor> <processor class="solr.TruncateFieldUpdateProcessorFactory"> <str name="fieldName">truncated_string_field_for_lang_detect</str> <int name="maxLength">65536</int> </processor> <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory"> <!-- <str name="langid.fl">title,subject,GIANT_HONKING_STRING_FIELD</str> --> <str name="langid.fl">title,subject,truncated_string_field_for_lang_detect</str> ... </processor> <processor class="solr.IgnoreFieldUpdateProcessorFactory"> <str name="fieldName">truncated_string_field_for_lang_detect</str> </processor> ... {code} Neither CloneFieldUpdateProcessorFactory nor TruncateFieldUpdateProcessorFactory will make a full copy of the original String value, and TruncateFieldUpdateProcessorFactory will only make a truncated copy if the sources is longer then the configured max (and even then wether any copy is actaully made really just depends on how the JVM implements substring). IgnoreFieldUpdateProcessorFactory will ensure that the truncated copy is freed up for GC as soon as you are done with LangId. > frequent OOM in LanguageIdentifierUpdateProcessor > ------------------------------------------------- > > Key: SOLR-3881 > URL: https://issues.apache.org/jira/browse/SOLR-3881 > Project: Solr > Issue Type: Bug > Components: update > Affects Versions: 4.0 > Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....) > Reporter: Rob Tulloh > > We are seeing frequent failures from Solr causing it to OOM. Here is the > stack trace we observe when this happens: > {noformat} > Caused by: java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2882) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) > at java.lang.StringBuffer.append(StringBuffer.java:224) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171) > at > org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120) > at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105) > at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186) > at > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147) > at > org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100) > at > org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47) > at > org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org