[ https://issues.apache.org/jira/browse/LUCENE-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000913#comment-13000913 ]
Simon Willnauer commented on LUCENE-2944: ----------------------------------------- IMO this is ICUs problem here. This code should not give the key.bytes array to the outer world in this particular case unless its documented that you must not use / modify the BytesRef you pass to toBytesRef anywhere else. > BytesRef reuse bugs in QueryParser and analysis.jsp > --------------------------------------------------- > > Key: LUCENE-2944 > URL: https://issues.apache.org/jira/browse/LUCENE-2944 > Project: Lucene - Java > Issue Type: Bug > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-2944.patch, LUCENE-2944.patch > > > Some code uses BytesRef as if it were a "String", in this case consumers of > TermToBytesRefAttribute. > The thing is, while our general implementation works on char[] and then > populates the consumers BytesRef, > not all TermToBytesRefAttribute implementations do this, specifically ICU > collation, it reuses the bytes and simply sets the pointers: > {noformat} > @Override > public int toBytesRef(BytesRef target) { > collator.getRawCollationKey(toString(), key); > target.bytes = key.bytes; > target.offset = 0; > target.length = key.size; > return target.hashCode(); > } > {noformat} > Most of the blame falls on me as I added this to the queryparser in > LUCENE-2514. > Attached is a patch so that these consumers re-use a 'spare' and copy the > bytes when they are going to make a long lasting object such as a Term. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org