Improve linkcheck performance (2x+) getting rid of jtidy dependency via regexps
-------------------------------------------------------------------------------

         Key: MPLINKCHECK-23
         URL: http://jira.codehaus.org/browse/MPLINKCHECK-23
     Project: maven-linkcheck-plugin
        Type: Improvement
    Versions: 1.3.4    
 Reporter: Ignacio G. Mac Dowell
 Attachments: linkcheck.patch

At the moment, the linkcheck plugin uses jtidy and xpath for retreiving all 
links. IMHO regexps would work much faster/better than jtidy-xpath combination.

The following regexp would be a replacement for the xpath expressions:

<(?>link|a|img|script)[^>]*?(?>href|src)\s*?=\s*?[\"'](.*?)[\"'][^>]*?

All tests pass with this regexp and in project ws-jaxme I am getting these 
results for  maven-linkcheck-plugin:clearcache 
maven-linkcheck-plugin:report-real:

with jtidy/xpath: Total time: 2 minutes 43 seconds
with regexps: Total time: 1 minutes 10 seconds

I am sure some regexp guru can improve the performance of this.

I have a question, though. Are mailto links supposed to count as checkable? IMO 
no.

PD: Also, IMO the createDocument method from LinkCheck should be on a try 
finally block.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to