On Tue, Nov 1, 2011 at 9:14 AM, Jukka Zitting <[email protected]> wrote: > Hi, > > On Tue, Nov 1, 2011 at 1:48 PM, Robert Muir <[email protected]> wrote: >> I really think tika should include the parts of icu4j it depends on. >> Often open source projects are hesitant to include icu jar because of >> its size, but thats silly since the size is just a catch-all. >> We can use the webapp to make a smaller one that includes the minimum >> of stuff Tika needs. http://apps.icu-project.org/datacustom/ > > We need a version that's available on the central Maven repository. >
perhaps as a start, we could include the whole icu from maven, and look at 'trimming' as an optimization? it would be nice to look at trying to remove the forked charsetdetection code too (whatever changes tika has, get them into ICU, etc) -- lucidimagination.com
