On Tue, Nov 1, 2011 at 8:48 AM, Robert Muir <[email protected]> wrote:
> I really think tika should include the parts of icu4j it depends on. > Often open source projects are hesitant to include icu jar because of > its size, but thats silly since the size is just a catch-all. > We can use the webapp to make a smaller one that includes the minimum > of stuff Tika needs. http://apps.icu-project.org/datacustom/ > > Maybe we should open a JIRA issue to fix this? I think its a bug that > Arabic and Persian text silently come out corrupted if you don't have > this in your classpath. +1 I think it's awful to just silently produce bad results. Mike McCandless http://blog.mikemccandless.com
