Re: Use of Unicode data in Lucene

Robert Muir Wed, 25 Feb 2009 12:47:35 -0800

Ken,

Just my opinion here... i work with a lot of multilingual data with lucene.
I can't imagine many serious real-world applications doing things such as
search that wouldn't need ICU for something anyway... even if its not the
lucene piece requiring it...


I hope this doesn't discourage you from doing what you are trying to do...
just my opinion. Maybe the JDK will catch up sometime soon and this won't be
an issue for long.

On Wed, Feb 25, 2009 at 3:22 PM, Ken Krugler <kkrugler_li...@transpac.com>wrote:

> Hi all,
>
> I've started working on something similar to
> https://issues.apache.org/jira/browse/LUCENE-1343, which is about creating
> a better (more universal) normalizer for words that "look the same".
>
> I'd like to avoid the dependency on ICU4J, which (I think) would otherwise
> prevent the code from being part of the core - due to license issues, it
> would have to languish in contrib.
>
> I can implement the functionality just using the data tables from the
> Unicode Consortium, including http://www.unicode.org/reports/tr39, but
> there's still the issue of the Unicode data license and its compatibility
> with Apache 2.0.
>
> Does anybody know whether http://www.unicode.org/copyright.html creates an
> issue? What's the process for vetting a license? Or is this something I
> should be posting to a different list?
>
> Thanks,
>
> -- Ken
> --
> Ken Krugler
> +1 530-210-6378
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

Re: Use of Unicode data in Lucene

Reply via email to