I went ahead and made some performance improvements. I hope this didn't step on the toe of Michael.

My CVS knowledge is rusty, but there's inherent problem in retaining the entire "cvs rlog" output in memory. CVS rlog returns some data for every file in the repository, so on a large repository, it'll produce huge data even if there haven't been many files that have changed. I fixed this.

I'm also bit disappointed that the optimization we had in the CVS plugin 1.x appears to be gone, where we observed what files have changed during update and use that list to reduce the target of the log command. This was very useful for typical CI situation where your changes between builds are small. I suspect this also has a performance implication.

Finally, I'd like to advise against relying regular _expression_ matching on entire "cvs log" output block, as it is both error prone and fragile.

When I look at https://github.com/jenkinsci/cvs-plugin/blob/master/src/main/java/hudson/scm/CvsLog.java#L273 I spot a number of problems right away — for example,

[\r|\n]+
is a mistake of
[\r\n]+
and there are various
.+?
that can incorrectly match multiple lines when the commit message contains stuff that looks suspiciously like cvs log output.


And in some cases regular _expression_ matching will result in very inefficient backtracking, which is what I suspect to be the cause when people report "cvs rlog stuck".

I thought we used to bundle package-renamed CVS log parsing code taken from Ant, which I thought did the job better. I'm curious what the motivation is for replacing that with a custom parsing code.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to