[ https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitaliy Zhovtyuk updated SOLR-3881: ----------------------------------- Attachment: SOLR-3881.patch About moving concatFields() to the tika language identifier: I think the way to go is just move the whole method there, then change the detectLanguage() method to take the SolrInputDocument instead of a String. You don't need to carry over the field[] parameter from concatFields(), since data member inputFields will be accessible everywhere it's needed. [VZ] This call looks more cleaner now, i changed inputFields to private now to reduce visibility scope I should have mentioned previously: I don't like the maxAppendSize and maxTotalAppendSize names - "size" is ambiguous (could refer to bytes, chars, whatever), and "append" refers to an internal operation... I'd like to see "append"=>"field value" and "size"=>"chars": maxFieldValueChars, and maxTotalChars (since appending doesn't need to be mentioned for the global limit). The same thing goes for the default constants and the test method names. [VZ] Renamed parameters and test methods Some minor issues I found with your patch: As I said previously: "We should also set default maxima for both per-value and total chars, rather than MAX_INT, as in the current patch." The total chars default should be its own setting; I was thinking we could make it double the per-value default? [VZ] added default value to maxTotalChars and changed both to 10K like in com.cybozu.labs.langdetect.Detector.maxLength It's better not to reorder import statements unless you're already making significant changes to them; it distracts from the meat of the change. (You reordered them in LangDetectLanguageIdentifierUpdateProcessor and LanguageIdentifierUpdateProcessorFactoryTestCase) [VZ] This is IDE optimization to put imports in alphabetical order - restored it to original order In LanguageIdentifierUpdateProcessor.concatFields(), when you trim the concatenated text to maxTotalAppendSize, I think StringBuilder.setLength(maxTotalAppendSize); would be more efficient than StringBuilder.delete(maxTotalAppendSize, sb.length() - 1); [VZ] Yep, cleaned that In addition to the test you added for the global limit, we should also test using both the per-value and global limits at the same time. [VZ] Tests for both limits added > frequent OOM in LanguageIdentifierUpdateProcessor > ------------------------------------------------- > > Key: SOLR-3881 > URL: https://issues.apache.org/jira/browse/SOLR-3881 > Project: Solr > Issue Type: Bug > Components: update > Affects Versions: 4.0 > Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....) > Reporter: Rob Tulloh > Fix For: 4.9, 5.0 > > Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch, > SOLR-3881.patch > > > We are seeing frequent failures from Solr causing it to OOM. Here is the > stack trace we observe when this happens: > {noformat} > Caused by: java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2882) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) > at java.lang.StringBuffer.append(StringBuffer.java:224) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171) > at > org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120) > at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105) > at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186) > at > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147) > at > org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100) > at > org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47) > at > org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org