Richard Eckart de Castilho created UIMA-6232:
------------------------------------------------
Summary: Reduce overhead of createTypeSystemDescription()
Key: UIMA-6232
URL: https://issues.apache.org/jira/browse/UIMA-6232
Project: UIMA
Issue Type: Improvement
Reporter: Richard Eckart de Castilho
Assignee: Richard Eckart de Castilho
Fix For: 2.6.0uimaFIT
uimaFIT offers a range of factory methods which use classpath scanning to
locate type system descriptions, type priority definitions and index
definitions.
The present implementation scans for each type of object once and then stores
the locations in which the descriptors were found in a global static variable.
The user can call a method to clear this variable and force a re-scan.
Whenever client code calls a method such as {{createTypeSystemDescription()}}
the cached locations are read, parsed, and a corresponding Java descriptor
object is created and returned.
This issue is about two problems with this approach:
1) finding of the descriptor locations does only consider the ClassLoader
situation the first time the scanning takes place. If at a later stage,
{{createTypeSystemDescription()}} is called in the context of a ClassLoader
with access to a different set of descriptions, this is not considered.
2) parsing the XML files every time e.g. {{createTypeSystemDescription()}} is
called is slowing uimaFIT down overall. These methods are potentially called
very often, in particular every time that {{createEngineDescription()}} or
similar methods are called. Depending on the context, the parse overhead can
have significant impact on the overall execution time.
As a solution for 1), we could adopt a similar approach as it is used for JCas
wrapper classes in the JCasImpl: the locations are stored in a {{WeakHashMap}}
mapping the current ClassLoader to the discovered locations. The "current"
ClassLoader is obtained via the Spring {{ClassUtils.getDefaultClassLoader()}}
which is also (indirectly) used in many other places in uimaFIT. In particular,
this method uses a Thead context classloader - if one is available.
As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the
locations, but also for the parsed and aggregated XML files. When calling e.g.
{{createTypeSystemDescription()}} and the cache already contains a respective
descriptor, then a deep clone of it is returned. A similar approach (cloning a
descriptor) was recently also introduced into UIMA Core to avoid repeatedly
loading and parsing default flow controller definitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)