I'm trying to index all the words without accent. I do the same when I'm querying, I remove the accent and lower case the search term. Why should I pass the string through the analyzer? or what is wrong if don't pass it through the analyzer? and what are the benefits? I'm just a newbie with Lucene.. Thanks a lot for your reply :]
Simon Willnauer wrote: > > On Fri, Jul 24, 2009 at 11:41 AM, luther blisset<sabri.br...@gmail.com> > wrote: >> >> Hi folks, >> I just upgrading Hibernate Search library of my app and so I had to >> upgrade >> Lucene too and pass from 2.2 to 2.4 version. >> In Lucene 2.4 the ISOLatin1AccentFilter class has changed and I can't >> figure >> how it works. >> I use a TwoWayFieldBridge to index the data and this is my set method: >> >> public void set(String s, Object o, Document document, Field.Store store, >> Field.Index index, Float aFloat){ >> >> //MyObject has a field name >> MyObject objectToIndex; >> >> //casting from Object to MyObject >> try{ >> objectToIndex = MyObject.class.cast(o); >> }catch(ClassCastException cEx ){} >> >> >> >> if (objectToIndex.getName() != null) { >> >> ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter(new >> StandardTokenizer(new StringReader(objectToIndex.getName()))); >> filter.removeAccents(objectToIndex.getName().toCharArray(), >> objectToIndex.getName().length()); >> Field name = new Field( "name", >> String.valueOf(objectToIndex.getName()).toLowerCase() , Field.Store.YES, >> Field.Index.UN_TOKENIZED ); >> >> document.add(name); >> } >> } >> > I do not really understand what you are trying to do. do you just > wanna remove the accents from the string and index it without passing > it through an analyzer?! (Field.Index.UN_TOKENIZED will not pass the > field value to an analyzer). > do you wanna index this without an analyzer?! > > If you pass an array to ISOLantin1AccentFilter#removeAccents() the > processed chars will be written to an private internal char array > inside the ISOLantin1AccentFilter. You can not use the removeAccents > method just removing the accents. what you could do as a dirty > workaround is the following: > String foo = "HÄllo HÄllo HÄllo HÄllo HÄllo"; > ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter( > new Tokenizer(new StringReader(foo)){ > private boolean isRead = false; > public Token next(final Token reusableToken) throws IOException { > if(isRead){ > return null; > } > BufferedReader reader = new BufferedReader(this.input); > StringBuilder builder = new StringBuilder(); > > char[] buffer = new char[1024]; > int read = -1; > while((read = reader.read(buffer)) > 0){ > builder.append(buffer, 0, read); > } > reusableToken.setTermText(builder.toString()); > isRead = true; > return reusableToken; > } > }); > Token t = filter.next(); > String foo_without_accents = t.term(); > System.out.println(foo_without_accents); > yields: HAllo HAllo HAllo HAllo HAllo > > > simon >> >> but it doesn't work. And if pass an accented word for the property >> objectToIndex.getName(), it remains with accent :( >> I think there is something wrong in my code when I create the new >> instance >> of ISOLatin1AccentFilter but I can' t get it works properly. >> Could someone help me? >> thanks a lot >> -- >> View this message in context: >> http://www.nabble.com/Removing-diacritics-with-ISOLatin1AccentFilter-tp24641618p24641618.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Removing-diacritics-with-ISOLatin1AccentFilter-tp24641618p24643036.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org