On Fri, Jul 24, 2009 at 11:41 AM, luther blisset<sabri.br...@gmail.com> wrote: > > Hi folks, > I just upgrading Hibernate Search library of my app and so I had to upgrade > Lucene too and pass from 2.2 to 2.4 version. > In Lucene 2.4 the ISOLatin1AccentFilter class has changed and I can't figure > how it works. > I use a TwoWayFieldBridge to index the data and this is my set method: > > public void set(String s, Object o, Document document, Field.Store store, > Field.Index index, Float aFloat){ > > //MyObject has a field name > MyObject objectToIndex; > > //casting from Object to MyObject > try{ > objectToIndex = MyObject.class.cast(o); > }catch(ClassCastException cEx ){} > > > > if (objectToIndex.getName() != null) { > > ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter(new > StandardTokenizer(new StringReader(objectToIndex.getName()))); > filter.removeAccents(objectToIndex.getName().toCharArray(), > objectToIndex.getName().length()); > Field name = new Field( "name", > String.valueOf(objectToIndex.getName()).toLowerCase() , Field.Store.YES, > Field.Index.UN_TOKENIZED ); > > document.add(name); > } > } > I do not really understand what you are trying to do. do you just wanna remove the accents from the string and index it without passing it through an analyzer?! (Field.Index.UN_TOKENIZED will not pass the field value to an analyzer). do you wanna index this without an analyzer?!
If you pass an array to ISOLantin1AccentFilter#removeAccents() the processed chars will be written to an private internal char array inside the ISOLantin1AccentFilter. You can not use the removeAccents method just removing the accents. what you could do as a dirty workaround is the following: String foo = "HÄllo HÄllo HÄllo HÄllo HÄllo"; ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter( new Tokenizer(new StringReader(foo)){ private boolean isRead = false; public Token next(final Token reusableToken) throws IOException { if(isRead){ return null; } BufferedReader reader = new BufferedReader(this.input); StringBuilder builder = new StringBuilder(); char[] buffer = new char[1024]; int read = -1; while((read = reader.read(buffer)) > 0){ builder.append(buffer, 0, read); } reusableToken.setTermText(builder.toString()); isRead = true; return reusableToken; } }); Token t = filter.next(); String foo_without_accents = t.term(); System.out.println(foo_without_accents); yields: HAllo HAllo HAllo HAllo HAllo simon > > but it doesn't work. And if pass an accented word for the property > objectToIndex.getName(), it remains with accent :( > I think there is something wrong in my code when I create the new instance > of ISOLatin1AccentFilter but I can' t get it works properly. > Could someone help me? > thanks a lot > -- > View this message in context: > http://www.nabble.com/Removing-diacritics-with-ISOLatin1AccentFilter-tp24641618p24641618.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org