[ https://issues.apache.org/jira/browse/TIKA-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-4132: ------------------------------ Description: Let's use this ticket to track removing deprecated bits and making small breaking changes to 3.x. Small breaking changes: 1) move the boilerpipe handler out of tika-parsers into a new boilerpipe module underneath tika-handlers? 2) remove handlers writing directly to outputstreams; they should be writing to writers ... or outputstreams+charsets 3) swap in jsoup as the default html parser. If you're referring to the o.a.t.p.html.HtmlParser in your config code, you'll need to change this to o.a.t.p.html.JSoupParser. Deprecated items I left in: 1) remove digesting option from app's cli -- I decided to keep this in. I removed the deprecation message. Tika-app should make as much stuff as easy as possible. 2) we deprecated a bunch of File calls in favor of Path. I've left those in for now (except in the tika-batch module). Deprecate new items: 1) we should get rid of AbstractParser, but we can't because that will break every parser that extends it (including vorbis etc). So, let's remove it as much as we can in our code base and properly deprecate it for removal in 4.x. was: Let's use this ticket to track removing deprecated bits and making small breaking changes to 3.x. Small breaking changes: 1) move the boilerpipe handler out of tika-parsers into a new boilerpipe module underneath tika-handlers? Deprecated items: 1) remove digesting option from app's cli -- I decided to keep this in. I removed the deprecation message. Deprecate new items: 1) we should get rid of AbstractParser, but we can't because that will break every parser that extends it (including vorbis etc). > Remove deprecated items and carry out other small breaking changes for 3.x > -------------------------------------------------------------------------- > > Key: TIKA-4132 > URL: https://issues.apache.org/jira/browse/TIKA-4132 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > Let's use this ticket to track removing deprecated bits and making small > breaking changes to 3.x. > Small breaking changes: > 1) move the boilerpipe handler out of tika-parsers into a new boilerpipe > module underneath tika-handlers? > 2) remove handlers writing directly to outputstreams; they should be writing > to writers ... or outputstreams+charsets > 3) swap in jsoup as the default html parser. If you're referring to the > o.a.t.p.html.HtmlParser in your config code, you'll need to change this to > o.a.t.p.html.JSoupParser. > Deprecated items I left in: > 1) remove digesting option from app's cli -- I decided to keep this in. I > removed the deprecation message. Tika-app should make as much stuff as easy > as possible. > 2) we deprecated a bunch of File calls in favor of Path. I've left those in > for now (except in the tika-batch module). > Deprecate new items: > 1) we should get rid of AbstractParser, but we can't because that will break > every parser that extends it (including vorbis etc). So, let's remove it as > much as we can in our code base and properly deprecate it for removal in 4.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)