[ https://issues.apache.org/jira/browse/LUCENE-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202705#comment-16202705 ]
Michael McCandless commented on LUCENE-7538: -------------------------------------------- @nkeet see {{IndexWriter.MAX_STORED_STRING_LENGTH}} for the limit; you'll have to either not store that huge file, or break it up across several documents. > Uploading large text file to a field causes "this IndexWriter is closed" error > ------------------------------------------------------------------------------ > > Key: LUCENE-7538 > URL: https://issues.apache.org/jira/browse/LUCENE-7538 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 5.5.1 > Reporter: Steve Chen > Fix For: 6.4, 7.0 > > Attachments: LUCENE-7538.patch, LUCENE-7538.patch > > > We have seen "this IndexWriter is closed" error after we tried to upload a > large text file to a single Solr text field. The field definition in the > schema.xml is: > {noformat} > <field name="fileContent" type="text_general" indexed="true" stored="true" > termVectors="true" termPositions="true" termOffsets="true"/> > {noformat} > After that, the IndexWriter remained closed and couldn't be recovered until > we reloaded the Solr core. The text file had size of 800MB, containing only > numbers and English characters. > Stack trace is shown below: > {noformat} > 2016-11-02 23:00:17,913 [http-nio-19082-exec-3] ERROR > org.apache.solr.handler.RequestHandlerBase - > org.apache.solr.common.SolrException: Exception writing document id 1487_0_1 > to the index; possible analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:180) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:934) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1089) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:712) > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:126) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:131) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:237) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:651) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184) > at > veeva.ecm.common.interfaces.web.SolrDispatchOverride.doFilter(SolrDispatchOverride.java:43) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) > at > org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521) > at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096) > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter > is closed > at > org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:720) > at > org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:734) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) > at > org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169) > ... 34 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 56 > at > org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:201) > at > org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:183) > at > org.apache.lucene.codecs.compressing.GrowableByteArrayDataOutput.writeString(GrowableByteArrayDataOutput.java:72) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:292) > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:382) > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:321) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477) > ... 37 more > {noformat} > We debugged and traced down the issue. It was an integer overflow problem > that was not properly handled. The > GrowableByteArrayDataOutput::writeString(String string) method is shown below: > {noformat} > @Override > public void writeString(String string) throws IOException { > int maxLen = string.length() * UnicodeUtil.MAX_UTF8_BYTES_PER_CHAR; > if (maxLen <= MIN_UTF8_SIZE_TO_ENABLE_DOUBLE_PASS_ENCODING) { > // string is small enough that we don't need to save memory by falling > back to double-pass approach > // this is just an optimized writeString() that re-uses scratchBytes. > scratchBytes = ArrayUtil.grow(scratchBytes, maxLen); > int len = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), > scratchBytes); > writeVInt(len); > writeBytes(scratchBytes, len); > } else { > // use a double pass approach to avoid allocating a large intermediate > buffer for string encoding > int numBytes = UnicodeUtil.calcUTF16toUTF8Length(string, 0, > string.length()); > writeVInt(numBytes); > bytes = ArrayUtil.grow(bytes, length + numBytes); > length = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), bytes, > length); > } > } > {noformat} > The 800MB text file stored in the string parameter of the method had a length > of 800 million, the maxLen became negative integer as the result of the > length times 3. The negative integer was then passed into > ArrayUtil.grow(scratchBytes, maxLen): > {noformat} > public static byte[] grow(byte[] array, int minSize) { > assert minSize >= 0: "size must be positive (got " + minSize + "): likely > integer overflow?"; > if (array.length < minSize) { > byte[] newArray = new byte[oversize(minSize, 1)]; > System.arraycopy(array, 0, newArray, 0, array.length); > return newArray; > } else > return array; > } > {noformat} > Assertion was disabled in production so the execution won't stop. The > original array was returned from the method call without increasing the size, > which caused an ArrayIndexOutOfBoundsException to be thrown. The > ArrayIndexOutOfBoundsException was wrapped into AbortingException, and later > on caused the IndexWriter to be closed in IndexWriter class. > The code should fail faster with a more-specific error for the integer > overflow problem, and shouldn't cause the IndexWriter to be closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org