[ 
https://issues.apache.org/jira/browse/LUCENE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106685#comment-14106685
 ] 

ASF subversion and git services commented on LUCENE-5400:
---------------------------------------------------------

Commit 1619730 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1619730 ]

LUCENE-5897, LUCENE-5400: JFlex-based tokenizers StandardTokenizer and 
UAX29URLEmailTokenizer tokenize extremely slowly over long sequences of text 
partially matching certain grammar rules.  The scanner default buffer size was 
reduced, and scanner buffer growth was disabled, resulting in much, much faster 
tokenization for these text sequences.

> Long text matching email local-part rule in UAX29URLEmailTokenizer causes 
> extremely slow tokenization
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5400
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5400
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.5
>            Reporter: Chris Geeringh
>            Assignee: Steve Rowe
>
> This is a pretty nasty bug, and causes the cluster to stop accepting updates. 
> I'm not sure how to consistently reproduce it but I have done so numerous 
> times. Switching to a whitespace tokenizer improved indexing speed, and I 
> never got the issue again.
> I'm running a 4.6 Snapshot - I had issues with deadlocks with numerous 
> versions of Solr, and have finally narrowed down the problem to this code, 
> which affects many/all(?) versions of Solr.
> When the thread hits this issue it uses 100% CPU, restarting the node which 
> has the error allows indexing to continue until hit again. Here is thread 
> dump:
> http-bio-8080-exec-45 (201)
>     
> org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken​(UAX29URLEmailTokenizerImpl.java:4343)
>     
> org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken​(UAX29URLEmailTokenizer.java:147)
>     
> org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken​(FilteringTokenFilter.java:82)
>     
> org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken​(LowerCaseFilter.java:54)
>     
> org.apache.lucene.index.DocInverterPerField.processFields​(DocInverterPerField.java:174)
>     
> org.apache.lucene.index.DocFieldProcessor.processDocument​(DocFieldProcessor.java:248)
>     
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument​(DocumentsWriterPerThread.java:253)
>     
> org.apache.lucene.index.DocumentsWriter.updateDocument​(DocumentsWriter.java:453)
>     org.apache.lucene.index.IndexWriter.updateDocument​(IndexWriter.java:1517)
>     
> org.apache.solr.update.DirectUpdateHandler2.addDoc​(DirectUpdateHandler2.java:217)
>     
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd​(RunUpdateProcessorFactory.java:69)
>     
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd​(UpdateRequestProcessor.java:51)
>     
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd​(DistributedUpdateProcessor.java:583)
>     
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd​(DistributedUpdateProcessor.java:719)
>     
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd​(DistributedUpdateProcessor.java:449)
>     
> org.apache.solr.handler.loader.JavabinLoader$1.update​(JavabinLoader.java:89)
>     
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator​(JavaBinUpdateRequestCodec.java:151)
>     
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator​(JavaBinUpdateRequestCodec.java:131)
>     org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:221)
>     
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList​(JavaBinUpdateRequestCodec.java:116)
>     org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:186)
>     org.apache.solr.common.util.JavaBinCodec.unmarshal​(JavaBinCodec.java:112)
>     
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal​(JavaBinUpdateRequestCodec.java:158)
>     
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs​(JavabinLoader.java:99)
>     org.apache.solr.handler.loader.JavabinLoader.load​(JavabinLoader.java:58)
>     
> org.apache.solr.handler.UpdateRequestHandler$1.load​(UpdateRequestHandler.java:92)
>     
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody​(ContentStreamHandlerBase.java:74)
>     
> org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:135)
>     org.apache.solr.core.SolrCore.execute​(SolrCore.java:1859)
>     
> org.apache.solr.servlet.SolrDispatchFilter.execute​(SolrDispatchFilter.java:703)
>     
> org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:406)
>     
> org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:195)
>     
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter​(ApplicationFilterChain.java:243)
>     
> org.apache.catalina.core.ApplicationFilterChain.doFilter​(ApplicationFilterChain.java:210)
>     
> org.apache.catalina.core.StandardWrapperValve.invoke​(StandardWrapperValve.java:222)
>     
> org.apache.catalina.core.StandardContextValve.invoke​(StandardContextValve.java:123)
>     
> org.apache.catalina.core.StandardHostValve.invoke​(StandardHostValve.java:171)
>     
> org.apache.catalina.valves.ErrorReportValve.invoke​(ErrorReportValve.java:99)
>     org.apache.catalina.valves.AccessLogValve.invoke​(AccessLogValve.java:953)
>     
> org.apache.catalina.core.StandardEngineValve.invoke​(StandardEngineValve.java:118)
>     
> org.apache.catalina.connector.CoyoteAdapter.service​(CoyoteAdapter.java:408)
>     
> org.apache.coyote.http11.AbstractHttp11Processor.process​(AbstractHttp11Processor.java:1023)
>     
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process​(AbstractProtocol.java:589)
>     
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run​(JIoEndpoint.java:312)
>     java.util.concurrent.ThreadPoolExecutor.runWorker​(Unknown Source)
>     java.util.concurrent.ThreadPoolExecutor$Worker.run​(Unknown Source)
>     java.lang.Thread.run​(Unknown Source)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to