[jira] [Updated] (UIMA-6413) Memory leak in FSClassRegistry

Richard Eckart de Castilho (Jira) Thu, 20 Jan 2022 23:50:06 -0800


     [ 
https://issues.apache.org/jira/browse/UIMA-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Richard Eckart de Castilho updated UIMA-6413:
---------------------------------------------
    Description: 
This is essentially a follow-up issue to UIMA-6276.

So, when a CAS is created, then a cache is filled in 
{{FSClassRegistry.cl_to_type2JCas}} which maintains information about the JCas 
representation of the different types. This is a per-classloader cache - so for 
every classloader which is involved in the creation of a (J)CAS, an entry is 
added. Now normally, classloaders are pretty long-lived objects and you only 
have so many during the runtime of a program. But there are cases where 
classloaders are created in volumes and in this case we run into trouble. Now, 
UIMA-6276 has turned the cache into a weak map hoping that once a classloader 
is garbarge-collected, the cache would get cleaned up automatically. However, 
that idea was not thought through entirely because one of the pieces of 
information stored in the map is a {{FsGenerator3}} and that generator is 
actually generated via the particular classloader that is the key in the map. 
Thus, a value in the map has a strong reference to the weak key causing the key 
never to get garbage collected... and there might be other fields as well 
contributing to that cycle.

In particular, a new classloader is generated whenever a new 
{{ResourceManager}} with a custom datapath, classpath, or both is created. A 
typical case for this to happen is when a PEAR is used. But there can be other 
reasons why somebody would create new custom resource managers.

Limiting the number of {{ResourceManager}}s in a system may not be feasible 
because typically there should be one per pipeline (to allow for shared 
resources), so if you are in a situation where pipelines are instantiated and 
destroyed repeatedly, it makes sense to create {{ResourceManager}}s alongside.

However, typically the number of classloaders in a system is pretty set. The 
{{ResourceManager}} internally wraps these classloaders with an 
{{UimaClassLoader}} (only if a specific classloader or a custom datapath is 
passed to the resource manager). So assuming that essentially always the same 
set of classloaders is provided (any maybe only a limited set of datapaths), it 
should be ok to introduce another cache of {{[classloader, datapath] -> 
UimaClassLoader}} to limit the number of {{UimaClassLoader}} instances and 
therefore limit the size of {{FSClassRegistry.cl_to_type2JCas}}.

  was:
This is essentially a follow-up issue to UIMA-6276.

So, when a CAS is created, then a cache is filled in 


> Memory leak in FSClassRegistry
> ------------------------------
>
>                 Key: UIMA-6413
>                 URL: https://issues.apache.org/jira/browse/UIMA-6413
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>            Reporter: Richard Eckart de Castilho
>            Assignee: Richard Eckart de Castilho
>            Priority: Major
>             Fix For: 3.3.0SDK
>
>
> This is essentially a follow-up issue to UIMA-6276.
> So, when a CAS is created, then a cache is filled in 
> {{FSClassRegistry.cl_to_type2JCas}} which maintains information about the 
> JCas representation of the different types. This is a per-classloader cache - 
> so for every classloader which is involved in the creation of a (J)CAS, an 
> entry is added. Now normally, classloaders are pretty long-lived objects and 
> you only have so many during the runtime of a program. But there are cases 
> where classloaders are created in volumes and in this case we run into 
> trouble. Now, UIMA-6276 has turned the cache into a weak map hoping that once 
> a classloader is garbarge-collected, the cache would get cleaned up 
> automatically. However, that idea was not thought through entirely because 
> one of the pieces of information stored in the map is a {{FsGenerator3}} and 
> that generator is actually generated via the particular classloader that is 
> the key in the map. Thus, a value in the map has a strong reference to the 
> weak key causing the key never to get garbage collected... and there might be 
> other fields as well contributing to that cycle.
> In particular, a new classloader is generated whenever a new 
> {{ResourceManager}} with a custom datapath, classpath, or both is created. A 
> typical case for this to happen is when a PEAR is used. But there can be 
> other reasons why somebody would create new custom resource managers.
> Limiting the number of {{ResourceManager}}s in a system may not be feasible 
> because typically there should be one per pipeline (to allow for shared 
> resources), so if you are in a situation where pipelines are instantiated and 
> destroyed repeatedly, it makes sense to create {{ResourceManager}}s alongside.
> However, typically the number of classloaders in a system is pretty set. The 
> {{ResourceManager}} internally wraps these classloaders with an 
> {{UimaClassLoader}} (only if a specific classloader or a custom datapath is 
> passed to the resource manager). So assuming that essentially always the same 
> set of classloaders is provided (any maybe only a limited set of datapaths), 
> it should be ok to introduce another cache of {{[classloader, datapath] -> 
> UimaClassLoader}} to limit the number of {{UimaClassLoader}} instances and 
> therefore limit the size of {{FSClassRegistry.cl_to_type2JCas}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (UIMA-6413) Memory leak in FSClassRegistry

Reply via email to