Hi, I'd like to discuss RAT-325 here, as I think, that this is the
proper place for such discussions.

Let me start by giving an outline of the information, that I currently have:

In RAT-325, the original reporter claimed to see extremely different
results in terms of performance between 0.15, and 0.16. This claim has
later on been confirmed by another user who also described how to
reproduce the issue on the source code of Apache Openmeetings. Using
that description, I was able to confirm that there is, indeed, a
massive gap.

The discussion quickly concemtrated on the SPDX support (more
precisely: The RegExp handling) as the most likely suspect. My
understanding is, that this feature has been introduced in 0.16, so
the assumption appears to be natural. On the other hand, as far as I
can tell, no evidence has been given so far, that nails down the fact.

In order to get some hard data, I did an experiment by changing the
source code of SPDXMatcherFactory.check(String,Match) as follows:

    private long totalCalls = 0;
    private long totalTime = 0;
    private boolean check(String line, Match caller) {
         final long startTime = System.currentTimeMillis();
         /* Real code follows here, creating a boolean variable result. */
         final long endTime = System.currentTimeMillis();
         totalTime += (endTime-startTime);
         ++totalCalls;
         System.out.println("check: totalCalls="
                                         + totalCalls + ", totalTime="
+ totalTime);
          return result;
    }

My assuption was: If the RegExp code (which is used in that method) is
the problem, then I would see the variable totalTime rise very
quickly, and roughly linear with the variable totalCalls. However,
that is not the case. Quoting from the output of "mvn clean
apache-rat:0.16.1-SNAPSHOT:check" in openmeetings/openmeetings-web, I
see

    check: totalCalls=377961, totalTime=6018
    check: totalCalls=377962, totalTime=6018
    check: totalCalls=377963, totalTime=6018
    check: totalCalls=377964, totalTime=53385
    check: totalCalls=377965, totalTime=97949
    check: totalCalls=377966, totalTime=151063
    check: totalCalls=377967, totalTime=197750

In summary, over the first 377963 calls, the performance is just fine,
with localTime growing much slower than totalCalls. However, beginning
with totalCalls=377964, the picture changes completely.

These results are, of course, strictly local on my machine (a rather
limited Chromebook), and perhaps not reproducable elsewhere. However,
if they are, then there is something going on, that I do not really
understand.

So, please try to reproduce this, and let me know you results, and/or ideas.

Thanks,

Jochen









-- 
The woman was born in a full-blown thunderstorm. She probably told it
to be quiet. It probably did. (Robert Jordan, Winter's heart)

Reply via email to