Re: Possible optimization for class loading?

2019-11-19 Thread Mark Thomas
On 17/11/2019 16:34, Rainer Jung wrote:
> Am 17.11.2019 um 17:21 schrieb Konstantin Kolinko:



>> Looking into AbstractFileResourceSet.file() from the above stacktrace,
>> some check are already there, e.g. "mustExist" flag. I see that
>> DirResourceSet.getResource(DirResourceSet.java:101) calls file(...,
>> false), so "mustExist" is false. It is just a first glance, I have not
>> dug further yet.
> 
> This is correct. A question is, whether all of the checks and
> normalizations in the default DirResourceSet are need for a resource
> with path starting with WEB-INF/classes, if WEB-INF/classes does not
> exist. AbstractFileResourceSet in my case does return a non-null File
> object after the (expensive) checks, which then is wrapped by
> DirResourceSet into an EmptyResource because the file does not exist.
> 
> Later is is found in the jar files, so the EmptyResource isn't used.
> 
> If the class does not get found in one of the jars, one would probably
> then have to create an appropriate EmptyResource to return.

I think there is scope for some optimisation here. I only took a quick
look but I couldn't see a way to do it without some changes to the API
(I'm fine with changes but was hoping to be able to do it without).

Mark

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Possible optimization for class loading?

2019-11-17 Thread Rainer Jung

Am 17.11.2019 um 17:21 schrieb Konstantin Kolinko:

вс, 17 нояб. 2019 г. в 18:40, Rainer Jung :


I noticed a slowness during Webapp startup on the Winows platform.
Investigation showed, that about three seconds of the webapp startup
time - roughly 1/3 of the total time - was due to
File.getCanonicalPath(). That call seems to be especially expensive on
Windows, because it iterates over each component of the path in question
and does at least three system calls on each of them. So if you app is
installed in a somewhat deeply nested path, getCanonicalPath() gets
especially expensive.



Just as a reminder:
getCanonicalPath() is necessary to deal with file system on Windows
being case-insensitive. (Removing those calls is likely to lead to
exploitable security issues.) On case-sensitive filesystems one can
configure  in Tomcat 8.5/9.0 and some
of those checks will be disabled.

http://tomcat.apache.org/tomcat-8.5-doc/config/resources.html

In Tomcat 7.0 the same configuration flag exists on Context,

http://tomcat.apache.org/tomcat-7.0-doc/config/context.html


Yes, setting allowLinking="true" fixes the performance impact but is not 
a real option on Windows, where the problem was observed. It wasn't 
observed on Linux, I guess the performance penalty of getCanonicalPath() 
on that platform is smaller (but didn't check).



One can check this with a simple test app and running the ProcessMonitpr
from SysInternals, which can nicely trace file system operations.

I noticed that by far the most occurrences happened during webapp class
loading. StandardRoot initializes resources to contain a DirResourceSet
first and then a JarResourceSet per WEB-INF/lib jar file.

Now any class loading (of not yet loaded classes) goes through the
following stack (this is for TC 8.5):

...
java.base@13.0.1/java.io.File.getCanonicalPath(File.java:618)
org.apache.catalina.webresources.AbstractFileResourceSet.file(AbstractFileResourceSet.java:90)
org.apache.catalina.webresources.DirResourceSet.getResource(DirResourceSet.java:101)
org.apache.catalina.webresources.StandardRoot.getResourceInternal(StandardRoot.java:281)
org.apache.catalina.webresources.Cache.getResource(Cache.java:62)
org.apache.catalina.webresources.StandardRoot.getResource(StandardRoot.java:216)
org.apache.catalina.webresources.StandardRoot.getClassLoaderResource(StandardRoot.java:225)
org.apache.catalina.loader.WebappClassLoaderBase.findClassInternal(WebappClassLoaderBase.java:2279)
org.apache.catalina.loader.WebappClassLoaderBase.findClass(WebappClassLoaderBase.java:857)
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1329)
- locked (a java.lang.Object@1797c4d) index 11 frame
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1329)
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1182)
...

In our case, the webapp does not have a WEB-INF/classes directory.
Nevertheless TC always goes through the DirResourceSet running the
expensive getCanonicalPath(), returning an EmptyResource and then
looking for the class and finding it in one of the jar file resources.
This happens about 5000 times (due to the number of classes to load).

I know that we have to do this in general, because WEB-INF/classes have
precedence over WEB-INF/lib, but maybe we could optimize in the case we
are looking for a class but WEB-INF/classes does not even exist? Would
this be a reasonable general optimization?


There are two scenarios:
(a) changes are expected: the web application classes can be updated
at runtime (with new classes being added),
(b) no changes: the web application classes are not updated at runtime.

(b) is certainly the case when the application runs in production mode
(when JSPs cannot be updated either)

(a) may be useful in some scenarios. Maybe in production if some
classes are added to a web application at run time as plugins.

I think that (b) is the case even when running in development mode, as
we are talking about classes, not JSPs here. IIRC Eclipse IDE restarts
a web application when its classes are updated and recompiled by the
IDE. I think that (a) is not used in development.
What do others think about it?
I remember some questions about hot-swapping of classes (for sake of
debugging), but Tomcat does not have such feature yet.

I think we can assume (b) by default, but have a flag to specify that
it is really (a).

If it is the (b) case, a single call to File.exists() /
File.isDirectory() is sufficient to remember "the classes directory
does not exist" and stop further lookups.


+1


If if is the (a) case, a call to call File.exists() /
File.isDirectory() can be added as an optimization before calling
getCanonicalPath().


+1


I think that we cannot remember "the classes directory does not exist"
fact without having a flag to disable such feature.


+1


Looking into AbstractFileResourceSet.file() from the above stacktrace,
some check are already there, 

Re: Possible optimization for class loading?

2019-11-17 Thread Konstantin Kolinko
вс, 17 нояб. 2019 г. в 18:40, Rainer Jung :
>
> I noticed a slowness during Webapp startup on the Winows platform.
> Investigation showed, that about three seconds of the webapp startup
> time - roughly 1/3 of the total time - was due to
> File.getCanonicalPath(). That call seems to be especially expensive on
> Windows, because it iterates over each component of the path in question
> and does at least three system calls on each of them. So if you app is
> installed in a somewhat deeply nested path, getCanonicalPath() gets
> especially expensive.
>

Just as a reminder:
getCanonicalPath() is necessary to deal with file system on Windows
being case-insensitive. (Removing those calls is likely to lead to
exploitable security issues.) On case-sensitive filesystems one can
configure  in Tomcat 8.5/9.0 and some
of those checks will be disabled.

http://tomcat.apache.org/tomcat-8.5-doc/config/resources.html

In Tomcat 7.0 the same configuration flag exists on Context,

http://tomcat.apache.org/tomcat-7.0-doc/config/context.html

> One can check this with a simple test app and running the ProcessMonitpr
> from SysInternals, which can nicely trace file system operations.
>
> I noticed that by far the most occurrences happened during webapp class
> loading. StandardRoot initializes resources to contain a DirResourceSet
> first and then a JarResourceSet per WEB-INF/lib jar file.
>
> Now any class loading (of not yet loaded classes) goes through the
> following stack (this is for TC 8.5):
>
> ...
> java.base@13.0.1/java.io.File.getCanonicalPath(File.java:618)
> org.apache.catalina.webresources.AbstractFileResourceSet.file(AbstractFileResourceSet.java:90)
> org.apache.catalina.webresources.DirResourceSet.getResource(DirResourceSet.java:101)
> org.apache.catalina.webresources.StandardRoot.getResourceInternal(StandardRoot.java:281)
> org.apache.catalina.webresources.Cache.getResource(Cache.java:62)
> org.apache.catalina.webresources.StandardRoot.getResource(StandardRoot.java:216)
> org.apache.catalina.webresources.StandardRoot.getClassLoaderResource(StandardRoot.java:225)
> org.apache.catalina.loader.WebappClassLoaderBase.findClassInternal(WebappClassLoaderBase.java:2279)
> org.apache.catalina.loader.WebappClassLoaderBase.findClass(WebappClassLoaderBase.java:857)
> org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1329)
> - locked (a java.lang.Object@1797c4d) index 11 frame
> org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1329)
> org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1182)
> ...
>
> In our case, the webapp does not have a WEB-INF/classes directory.
> Nevertheless TC always goes through the DirResourceSet running the
> expensive getCanonicalPath(), returning an EmptyResource and then
> looking for the class and finding it in one of the jar file resources.
> This happens about 5000 times (due to the number of classes to load).
>
> I know that we have to do this in general, because WEB-INF/classes have
> precedence over WEB-INF/lib, but maybe we could optimize in the case we
> are looking for a class but WEB-INF/classes does not even exist? Would
> this be a reasonable general optimization?

There are two scenarios:
(a) changes are expected: the web application classes can be updated
at runtime (with new classes being added),
(b) no changes: the web application classes are not updated at runtime.

(b) is certainly the case when the application runs in production mode
(when JSPs cannot be updated either)

(a) may be useful in some scenarios. Maybe in production if some
classes are added to a web application at run time as plugins.

I think that (b) is the case even when running in development mode, as
we are talking about classes, not JSPs here. IIRC Eclipse IDE restarts
a web application when its classes are updated and recompiled by the
IDE. I think that (a) is not used in development.
What do others think about it?
I remember some questions about hot-swapping of classes (for sake of
debugging), but Tomcat does not have such feature yet.

I think we can assume (b) by default, but have a flag to specify that
it is really (a).

If it is the (b) case, a single call to File.exists() /
File.isDirectory() is sufficient to remember "the classes directory
does not exist" and stop further lookups.

If if is the (a) case, a call to call File.exists() /
File.isDirectory() can be added as an optimization before calling
getCanonicalPath().

I think that we cannot remember "the classes directory does not exist"
fact without having a flag to disable such feature.


Looking into AbstractFileResourceSet.file() from the above stacktrace,
some check are already there, e.g. "mustExist" flag. I see that
DirResourceSet.getResource(DirResourceSet.java:101) calls file(...,
false), so "mustExist" is false. It is just a first glance, I have not
dug further yet.

Best regards,
Konstantin Kolinko


Possible optimization for class loading?

2019-11-17 Thread Rainer Jung
I noticed a slowness during Webapp startup on the Winows platform. 
Investigation showed, that about three seconds of the webapp startup 
time - roughly 1/3 of the total time - was due to 
File.getCanonicalPath(). That call seems to be especially expensive on 
Windows, because it iterates over each component of the path in question 
and does at least three system calls on each of them. So if you app is 
installed in a somewhat deeply nested path, getCanonicalPath() gets 
especially expensive.


One can check this with a simple test app and running the ProcessMonitpr 
from SysInternals, which can nicely trace file system operations.


I noticed that by far the most occurrences happened during webapp class 
loading. StandardRoot initializes resources to contain a DirResourceSet 
first and then a JarResourceSet per WEB-INF/lib jar file.


Now any class loading (of not yet loaded classes) goes through the 
following stack (this is for TC 8.5):


...
java.base@13.0.1/java.io.File.getCanonicalPath(File.java:618)
org.apache.catalina.webresources.AbstractFileResourceSet.file(AbstractFileResourceSet.java:90)
org.apache.catalina.webresources.DirResourceSet.getResource(DirResourceSet.java:101)
org.apache.catalina.webresources.StandardRoot.getResourceInternal(StandardRoot.java:281)
org.apache.catalina.webresources.Cache.getResource(Cache.java:62)
org.apache.catalina.webresources.StandardRoot.getResource(StandardRoot.java:216)
org.apache.catalina.webresources.StandardRoot.getClassLoaderResource(StandardRoot.java:225)
org.apache.catalina.loader.WebappClassLoaderBase.findClassInternal(WebappClassLoaderBase.java:2279)
org.apache.catalina.loader.WebappClassLoaderBase.findClass(WebappClassLoaderBase.java:857)
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1329)
- locked (a java.lang.Object@1797c4d) index 11 frame 
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1329)

org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1182)
...

In our case, the webapp does not have a WEB-INF/classes directory. 
Nevertheless TC always goes through the DirResourceSet running the 
expensive getCanonicalPath(), returning an EmptyResource and then 
looking for the class and finding it in one of the jar file resources. 
This happens about 5000 times (due to the number of classes to load).


I know that we have to do this in general, because WEB-INF/classes have 
precedence over WEB-INF/lib, but maybe we could optimize in the case we 
are looking for a class but WEB-INF/classes does not even exist? Would 
this be a reasonable general optimization?


Unfortunately the ResourceSet classes can not be easily extended. Since 
most of their fields are private, one would have to fork lots of code. 
It would be much easier to add the feature to the existing implementations.


Regards,

Rainer

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org