[ https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885658#action_12885658 ]
Hoss Man commented on SOLR-1283: -------------------------------- As i mentioned in IRC (prior to Grant's previously posted comments) the core issue is: what is the intended purpose of the "numRead" counter? * If it's suppose to count the number of times "input.read()" is called (ie: "num read from inner stream"), then "peek" has a bug by not incrementing. * If it's suppose to count the number of times "next()" returns a char (ie: "num read from outer stream"), then as grant mentioned "next" has a bug by not incrementing when using the stack. The patch currently assumes the former and seems to fix the bug, i haven't tried the same test case with an approach to the later, but i suspect that may also work. > Mark Invalid error on indexing > ------------------------------ > > Key: SOLR-1283 > URL: https://issues.apache.org/jira/browse/SOLR-1283 > Project: Solr > Issue Type: Bug > Affects Versions: 1.3 > Environment: Ubuntu 8.04, Sun Java 6 > Reporter: solrize > Fix For: 3.1, 4.0 > > Attachments: SOLR-1283.modules.patch, SOLR-1283.patch > > > When indexing large (1 megabyte) documents I get a lot of exceptions with > stack traces like the below. It happens both in the Solr 1.3 release and in > the July 9 1.4 nightly. I believe this to NOT be the same issue as SOLR-42. > I found some further discussion on solr-user: > http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html > > In that discussion, Grant asked the original poster to open a Jira issue, but > I didn't see one so I'm opening one; please feel free to merge or close if > it's redundant. > My stack trace follows. > Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 > Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log > SEVERE: java.io.IOException: Mark invalid > at java.io.BufferedReader.reset(BufferedReader.java:485) > at > org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171) > at > org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728) > at > org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742) > at java.io.Reader.read(Reader.java:123) > at > org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108) > at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178) > at > org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84) > at > org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53) > at > org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159) > at > org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org