Julien Nioche created NUTCH-1652:
------------------------------------

             Summary: Avoid instanciation of MimeUtil for each Content object 
created
                 Key: NUTCH-1652
                 URL: https://issues.apache.org/jira/browse/NUTCH-1652
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.7
            Reporter: Julien Nioche


Content objects instantiate and hold a MimeUtil in the constructor used by the 
HttpBase class. This is wasteful and unnecessarily slows down the creation of 
Content object as the MimeUtil creates a new Tika instance, reads from the 
configuration etc...

Instead we could create a single instance of the MimeUtil class and pass it to 
the a new Content constructor   

{code}
public Content(String url, String base, byte[] content, String contentType,
      Metadata metadata, MimeUtil mime)
{code}

and create a single instance of MimeUtil in HttpBase. We would also need to 
make sure that the synchronisation is handled properly in MimeUtil (especially 
for the calls to Tika) as the creation of the Content is done in a 
multithreaded environment.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to