Richard Eckart de Castilho created UIMA-6232:
------------------------------------------------

             Summary: Reduce overhead of createTypeSystemDescription()
                 Key: UIMA-6232
                 URL: https://issues.apache.org/jira/browse/UIMA-6232
             Project: UIMA
          Issue Type: Improvement
            Reporter: Richard Eckart de Castilho
            Assignee: Richard Eckart de Castilho
             Fix For: 2.6.0uimaFIT


uimaFIT offers a range of factory methods which use classpath scanning to 
locate type system descriptions, type priority definitions and index 
definitions. 

The present implementation scans for each type of object once and then stores 
the locations in which the descriptors were found in a global static variable. 
The user can call a method to clear this variable and force a re-scan.

Whenever client code calls a method such as {{createTypeSystemDescription()}} 
the cached locations are read, parsed, and a corresponding Java descriptor 
object is created and returned.

This issue is about two problems with this approach:

1) finding of the descriptor locations does only consider the ClassLoader 
situation the first time the scanning takes place. If at a later stage, 
{{createTypeSystemDescription()}} is called in the context of a ClassLoader 
with access to a different set of descriptions, this is not considered.
2) parsing the XML files every time e.g.  {{createTypeSystemDescription()}} is 
called is slowing uimaFIT down overall. These methods are potentially called 
very often, in particular every time that {{createEngineDescription()}} or 
similar methods are called. Depending on the context, the parse overhead can 
have significant impact on the overall execution time.

As a solution for 1), we could adopt a similar approach as it is used for JCas 
wrapper classes in the JCasImpl: the locations are stored in a {{WeakHashMap}} 
mapping the current ClassLoader to the discovered locations. The "current" 
ClassLoader is obtained via the Spring {{ClassUtils.getDefaultClassLoader()}} 
which is also (indirectly) used in many other places in uimaFIT. In particular, 
this method uses a Thead context classloader - if one is available.

As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the 
locations, but also for the parsed and aggregated XML files. When calling e.g. 
{{createTypeSystemDescription()}} and the cache already contains a respective 
descriptor, then a deep clone of it is returned. A similar approach (cloning a 
descriptor) was recently also introduced into UIMA Core to avoid repeatedly 
loading and parsing default flow controller definitions.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to