[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

2010-07-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1283:
---

Fix Version/s: 3.1
   4.0

we have a patch that *seems* to work, so we should dfinitely try to get this 
into the next release ... i'm hoping someone more familiar with the code can 
sanity check it soon.

 Mark Invalid error on indexing
 --

 Key: SOLR-1283
 URL: https://issues.apache.org/jira/browse/SOLR-1283
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
 Environment: Ubuntu 8.04, Sun Java 6
Reporter: solrize
 Fix For: 3.1, 4.0

 Attachments: SOLR-1283.modules.patch, SOLR-1283.patch


 When indexing large (1 megabyte) documents I get a lot of exceptions with 
 stack traces like the below.  It happens both in the Solr 1.3 release and in 
 the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
 I found some further discussion on solr-user: 
 http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
  
 In that discussion, Grant asked the original poster to open a Jira issue, but 
 I didn't see one so I'm opening one; please feel free to merge or close if 
 it's redundant. 
 My stack trace follows.
 Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
 Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: Mark invalid
 at java.io.BufferedReader.reset(BufferedReader.java:485)
 at 
 org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
 at java.io.Reader.read(Reader.java:123)
 at 
 org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
 at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
 at 
 org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
 at 
 org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
 at 
 org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
 at 
 org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
 at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at 
 

[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

2010-07-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1283:
---

Attachment: SOLR-1283.modules.patch

Updates patch to trunk (where the charfilter stuff has been refactored into the 
new top level modules directory)

I'm not familiar with the HTMLStripCharFilter stuff, so i can't say whether the 
fix is correct (no idea if peek should be incrementing that counter -- 
that's why even private methods should have javadocs), but the test certainly 
looks valid to me

 Mark Invalid error on indexing
 --

 Key: SOLR-1283
 URL: https://issues.apache.org/jira/browse/SOLR-1283
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
 Environment: Ubuntu 8.04, Sun Java 6
Reporter: solrize
 Attachments: SOLR-1283.modules.patch, SOLR-1283.patch


 When indexing large (1 megabyte) documents I get a lot of exceptions with 
 stack traces like the below.  It happens both in the Solr 1.3 release and in 
 the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
 I found some further discussion on solr-user: 
 http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
  
 In that discussion, Grant asked the original poster to open a Jira issue, but 
 I didn't see one so I'm opening one; please feel free to merge or close if 
 it's redundant. 
 My stack trace follows.
 Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
 Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: Mark invalid
 at java.io.BufferedReader.reset(BufferedReader.java:485)
 at 
 org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
 at java.io.Reader.read(Reader.java:123)
 at 
 org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
 at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
 at 
 org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
 at 
 org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
 at 
 org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
 at 
 org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
 at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at 
 

[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

2010-01-26 Thread Julien Coloos (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Coloos updated SOLR-1283:


Attachment: SOLR-1283.patch

The issue is also happening in current trunk (revision 903234), with the class 
{{HTMLStripCharFilter}} (replacing deprecated {{HTMLStripReader}} it seems).

Example of stacktrace:
{noformat}
26 janv. 2010 16:02:56 org.apache.solr.common.SolrException log
GRAVE: java.io.IOException: Mark invalid
at java.io.BufferedReader.reset(BufferedReader.java:485)
at org.apache.lucene.analysis.CharReader.reset(CharReader.java:63)
at 
org.apache.solr.analysis.HTMLStripCharFilter.restoreState(HTMLStripCharFilter.java:172)
at 
org.apache.solr.analysis.HTMLStripCharFilter.read(HTMLStripCharFilter.java:734)
at 
org.apache.solr.analysis.HTMLStripCharFilter.read(HTMLStripCharFilter.java:748)
at java.io.Reader.read(Reader.java:122)
at 
org.apache.lucene.analysis.CharTokenizer.incrementToken(CharTokenizer.java:77)
at 
org.apache.lucene.analysis.ISOLatin1AccentFilter.incrementToken(ISOLatin1AccentFilter.java:43)
at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:383)
at 
org.apache.lucene.analysis.ISOLatin1AccentFilter.next(ISOLatin1AccentFilter.java:64)
at 
org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:379)
at 
org.apache.lucene.analysis.TokenStream.incrementToken(TokenStream.java:318)
at 
org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:225)
at 
org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:38)
at 
org.apache.solr.analysis.SnowballPorterFilter.incrementToken(SnowballPorterFilterFactory.java:116)
at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:406)
at 
org.apache.solr.analysis.BufferedTokenStream.read(BufferedTokenStream.java:97)
at 
org.apache.solr.analysis.BufferedTokenStream.next(BufferedTokenStream.java:83)
at 
org.apache.lucene.analysis.TokenStream.incrementToken(TokenStream.java:321)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:781)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:764)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2630)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2602)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at