[ https://issues.apache.org/jira/browse/LUCENE-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe resolved LUCENE-6682. -------------------------------- Resolution: Fixed Assignee: Steve Rowe Fix Version/s: Trunk 5.3 Committed to trunk and branch_5x. Thanks for reporting, Piotr! > StandardTokenizer performance bug: buffer is unnecessarily copied when > maxTokenLength doesn't change > ---------------------------------------------------------------------------------------------------- > > Key: LUCENE-6682 > URL: https://issues.apache.org/jira/browse/LUCENE-6682 > Project: Lucene - Core > Issue Type: Bug > Reporter: Steve Rowe > Assignee: Steve Rowe > Fix For: 5.3, Trunk > > > From Piotr Idzikowski on java-user mailing list > [http://markmail.org/message/af26kr7fermt2tfh]: > {quote} > I am developing own analyzer based on StandardAnalyzer. > I realized that tokenizer.setMaxTokenLength is called many times. > {code:java} > protected TokenStreamComponents createComponents(final String fieldName, > final Reader reader) { > final StandardTokenizer src = new StandardTokenizer(getVersion(), > reader); > src.setMaxTokenLength(maxTokenLength); > TokenStream tok = new StandardFilter(getVersion(), src); > tok = new LowerCaseFilter(getVersion(), tok); > tok = new StopFilter(getVersion(), tok, stopwords); > return new TokenStreamComponents(src, tok) { > @Override > protected void setReader(final Reader reader) throws IOException { > src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength); > super.setReader(reader); > } > }; > } > {code} > Does it make sense if length stays the same? I see it finally calls this > one( in StandardTokenizerImpl ): > {code:java} > public final void setBufferSize(int numChars) { > ZZ_BUFFERSIZE = numChars; > char[] newZzBuffer = new char[ZZ_BUFFERSIZE]; > System.arraycopy(zzBuffer, 0, newZzBuffer, 0, > Math.min(zzBuffer.length, ZZ_BUFFERSIZE)); > zzBuffer = newZzBuffer; > } > {code} > So it just copies old array content into the new one. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org