Julien Nioche created NUTCH-1652: ------------------------------------ Summary: Avoid instanciation of MimeUtil for each Content object created Key: NUTCH-1652 URL: https://issues.apache.org/jira/browse/NUTCH-1652 Project: Nutch Issue Type: Improvement Affects Versions: 1.7 Reporter: Julien Nioche
Content objects instantiate and hold a MimeUtil in the constructor used by the HttpBase class. This is wasteful and unnecessarily slows down the creation of Content object as the MimeUtil creates a new Tika instance, reads from the configuration etc... Instead we could create a single instance of the MimeUtil class and pass it to the a new Content constructor {code} public Content(String url, String base, byte[] content, String contentType, Metadata metadata, MimeUtil mime) {code} and create a single instance of MimeUtil in HttpBase. We would also need to make sure that the synchronisation is handled properly in MimeUtil (especially for the calls to Tika) as the creation of the Content is done in a multithreaded environment. -- This message was sent by Atlassian JIRA (v6.1#6144)