Github user ansell commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-201545776
  
    The system does seem a little too complex for our purposes and isn't usable 
because of that.
    
    Removing generics would be the first step IMO as there are too many 
rawtypes definitions which indicate generics are being used badly.
    
    ContentExtractor may be able to be completely removed instead of being 
refitted into the process after that and the parser should always be set to 
parse as far as practical for our purposes.
    
    It is a little strange that there isn't a buffered, markable, InputStream 
provided for all of the steps to reuse as necessary rather than pushing a raw 
InputStream or other source into different extractors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to